Rain Lag

The Cardboard Control Tower: Prototyping Safer Incidents With Disposable Paper War Rooms

How low-cost, cardboard-and-paper “war rooms” can turn incident response from a dry tabletop exercise into a realistic, collaborative design practice that builds true resilience.

The Cardboard Control Tower: Prototyping Safer Incidents With Disposable Paper War Rooms

If your incident practice consists of slide decks, a conference room, and a few bored people saying, “I’d probably do X,” you are leaving reliability to chance.

Real incidents are messy. People are distributed across time zones, tools are half-broken, Slack is on fire, and your pristine runbooks suddenly look like fiction. Yet most organizations still “practice” with abstract tabletop exercises that bear little resemblance to the chaos of the real thing.

There is a better way: build a cardboard control tower.

Not an actual tower, of course, but a physical, disposable “paper war room” where you prototype incidents using cardboard, markers, and sticky notes. It’s cheap, low-stakes, and surprisingly powerful.

This post walks through why physical simulations work, how to run them, and how to use them to evolve incident response from a checkbox ritual into a core reliability practice.


Why Traditional Tabletop Exercises Fall Short

Classic tabletop exercises usually look like this:

  • A slide deck describing a hypothetical outage
  • A single facilitator driving the scenario
  • A handful of people in a room (or on a Zoom) talking through “what we’d do”

These sessions are:

  • Too abstract – People describe idealized behavior, not what actually happens when they are tired, paged at 3 a.m., or missing context.
  • Too centralized – Everyone is in one (virtual) room, but real incidents are distributed: some people are on-call, some are commuting, others are half-present in multiple channels.
  • Too linear – Incidents rarely follow a single storyline; information arrives late, tools misbehave, and decision paths branch.

As a result, you get theory, not muscle memory. People leave feeling they “passed the exercise,” but the organization has not truly rehearsed how it will behave under pressure.


Enter the Paper War Room: Making Incidents Tangible

A paper war room is a physical simulation space where your incident unfolds across walls and tables instead of slides. Think:

  • A whiteboard or wall becomes your system map.
  • Cardboard shapes become services, teams, or communication channels.
  • Sticky notes represent events, alerts, decisions, and handoffs.

This “cardboard control tower” approach changes the feel of practice:

  • Tangible: You see the incident. You can point to it, move it, group it, and feel the complexity.
  • Collaborative: People stand, walk, cluster, and talk. It becomes a team sport instead of a turn-based Q&A.
  • Low-stakes: It is just cardboard and paper. Nothing official is being broken; everything can be rearranged.

You are no longer talking about incident response. You are rehearsing it, physically.


Why Disposable Artifacts Are a Feature, Not a Bug

The key to cardboard control towers is that everything is disposable and easily reconfigurable:

  • Services: index cards or cardboard rectangles
  • Roles: colored badges or sticky notes
  • Communication paths: string, arrows, or marker lines
  • Events: timestamps on sticky notes

Because nothing feels permanent, teams are more willing to:

  • Experiment with new flows – “What if we changed who triages first?” Move a card and see what happens.
  • Redesign roles safely – “What if we had a separate status-commander?” Add a sticky note and try it for a run.
  • Challenge assumptions – “Do we really need four approvals here?” Cross it out and simulate the impact.

You are treating incident response like a design problem, not a sacred process. That mindset is what keeps runbooks fresh and aligned with reality.


How to Run a Cardboard Control Tower Session

You do not need expensive tools to get started. You need:

  • A room with writable surfaces (whiteboard or big paper rolls)
  • Index cards, sticky notes, markers, tape
  • A facilitator and a few participant roles (e.g., incident commander, comms, domain experts)

1. Map the System and the People

Start by mapping the world you are simulating:

  • Put core services on the wall as cards (API, DB, auth, payment, etc.).
  • Add teams who own each service.
  • Draw communication channels: Slack incidents channel, on-call phone, status page, ticketing, etc.

Do not aim for perfect architecture diagrams. You want a working map of how information and responsibility flow.

2. Choose a Scenario

Pick a realistic but not catastrophic incident, such as:

  • Latency spike on a key API
  • Partial outage in one region
  • Misconfigured feature flag causing customer impact

Define a simple starting state on sticky notes:

  • “09:00 – Alert: API latency > 2s in us-east-1”
  • “09:02 – Customer support reports login failures”

3. Simulate in Real (or Compressed) Time

Run the scenario in time-boxed steps (e.g., 5 minutes of incident time per 5 minutes of real time):

  • The facilitator introduces new events: alarms, logs, customer reports, or tool failures.
  • Participants respond using only the communication paths and roles they actually have.
  • Every action is represented physically: moving a card, placing a sticky note, drawing a line.

You are aiming to answer: How does the incident actually propagate through our system and our organization?

4. Track Decisions, Delays, and Confusion

As the scenario unfolds, capture:

  • Where did decisions get stuck?
  • Where was information missing, late, or duplicated?
  • Which roles were overloaded or unclear?

Mark these moments with distinct-colored sticky notes (e.g., red for delays, orange for confusion, blue for “surprising workaround”). These become your goldmine for improvements.

5. Debrief as Designers, Not Prosecutors

After the run, step back and treat the wall like a prototype:

  • What would we simplify, merge, or remove?
  • Where should we add automation or clearer ownership?
  • Which runbooks matched reality, and which are clearly outdated?

Turn the insights directly into changes—updated runbooks, clearer roles, modified escalation paths—and plan to test them in the next drill.


From Checkbox Tabletop to Core Reliability Practice

Most organizations treat drills as a compliance checkbox:

  • Once-a-year tabletop
  • Sign the attendance sheet
  • File the slides

Cardboard control towers invite you to treat incident practice as iterative craft:

  • Recurring drills: Short, structured sessions—monthly or quarterly—focused on specific failure modes.
  • Runbook rehearsals: Take a single runbook and walk it physically, step by step. Where do people get stuck? What information is assumed but not available?
  • Time-boxed simulations: 60–90 minute drills where the clock matters; practice making “good enough” decisions under pressure.

Teams that do this consistently tend to:

  • Preserve SLOs more effectively, because they know how to act early.
  • Shorten outages, because coordination patterns are practiced, not invented on the fly.
  • Build psychological safety, because people have seen the movie before and know their role.

Building Real Resilience and Muscle Memory

Intellectual understanding is not the same as operational readiness.

Attack and response simulations that mimic real-world conditions—time pressure, incomplete information, tool failures—are what build true resilience. Paper war rooms give you a safe sandbox for exactly that:

  • Want to see what happens when your primary incident channel is noisy or down? Cross it off the wall and reroute.
  • Curious how a partial team (holiday, illness, timezone gaps) would cope? Remove some role cards and run the scenario.
  • Wondering if a new role (e.g., customer liaison) would help? Introduce it mid-simulation and see.

Because everything is cardboard and paper, you can explore “breaking changes” without risk—and then selectively carry the best patterns back into production.

Over time, your team’s on-call readiness shifts from “I think I know what I’d do” to “We’ve actually practiced this pattern.”


Getting Started Next Week

You do not need executive sponsorship to start small. Try this:

  1. Pick one service you care about.
  2. Invite 4–6 people: at least one on-caller, one team lead, and someone from support or product.
  3. Reserve 90 minutes in a room with a whiteboard.
  4. Gather supplies: index cards, markers, sticky notes, tape.
  5. Run a single, modest scenario and focus on one question: “Where did communication or coordination break down?”

Document just three categories of outcomes:

  • One thing to remove (a redundant step, approval, or tool).
  • One thing to clarify (ownership, escalation path, or communication channel).
  • One thing to practice again (a particularly tricky handoff or diagnostic step).

Repeat in a month. Adjust the cardboard, adjust the process, and watch your incident response become sharper.


Conclusion: Reliability as a Hands-On Design Discipline

Incidents will never be fully predictable. But your response can be.

By turning incident practice into a hands-on, iterative design process—complete with cardboard, paper, and markers—you:

  • Make invisible systems and social dynamics visible.
  • Uncover stale assumptions and out-of-date runbooks.
  • Build real muscle memory across distributed teams.
  • Treat reliability not as a one-time project, but as an evolving craft.

The cardboard control tower is not about arts and crafts. It is about prototyping safer incidents before the real ones hit.

If your current tabletop exercises feel too polished and detached from reality, grab a marker, some cardboard, and a wall. Your next outage will thank you.

The Cardboard Control Tower: Prototyping Safer Incidents With Disposable Paper War Rooms | Rain Lag