Rain Lag

The Paper-Circuit On-Call Theater: Rehearsing Production Incidents With Index Cards, Floor Tape, and Zero Screens

How to use low-tech, theatrical, paper-based simulations to rehearse production incidents, de-risk your on-call practice, and build a stronger culture of incident readiness—no laptops required.

The Paper-Circuit On-Call Theater: Rehearsing Production Incidents With Index Cards, Floor Tape, and Zero Screens

Most teams practice incident response in front of monitors: dashboards, runbooks, ticket queues, and endless Slack scrolls. But what if you turned all the screens off—and still rehearsed production incidents effectively?

Welcome to the Paper-Circuit On-Call Theater: a screen-free, physical way to simulate outages using nothing more than index cards, floor tape, printed diagrams, and some imagination.

This isn’t a cute retro gimmick. Physical, low-fidelity simulations can teach incident response concepts just as effectively as high-tech tools—and in some ways, they’re better. By treating incidents as theater, not just shell commands, you emphasize the things that really matter when systems are on fire: roles, handoffs, and communication.


Why Act Out Incidents Instead of Clicking Through Them?

When there’s a real outage, the tools you use are important. But they aren’t the hardest part. The hardest part is everything around the tools:

  • Who declares the incident?
  • Who talks to stakeholders?
  • Who decides to roll back vs. roll forward?
  • How do handoffs work when shifts change mid-incident?
  • How do you avoid five people debugging the same thing in silence?

Physical simulations strip away dashboards and terminals so you can focus on the humans and the process.

Key reasons this works

1. It emphasizes roles over keystrokes
When you rehearse as a kind of theater, people inhabit roles: Incident Commander, Communications Lead, Ops/SRE, Product Rep, Observer, etc. The focus shifts from “What command do I type?” to “What decision do I make, and who do I tell?”

2. It’s low-stakes and low-fidelity by design
An index card can represent a database. A piece of tape can represent a service boundary. Because nothing is real, it’s safe to fail. You can try bad ideas, explore weird edge cases, and pause to ask, “What if we did this differently?”—with zero production risk.

3. It exposes hidden failure modes
In a live incident, you’re too busy to notice all the small friction points. In a paper simulation, you have time to see them: missing runbooks, unclear escalation paths, unclear ownership, monitoring blind spots, or decision paralysis.

4. It’s memorable and fun
Mixing SRE practice with game-like or theatrical elements turns a dry training into something people actually remember. The more engaging the exercise, the more likely your team will retain lessons and apply them at 3 a.m. on a Sunday.

5. It’s cheap and accessible
You don’t need licenses, special tools, or elaborate environments. Just:

  • Index cards or sticky notes
  • Floor tape or whiteboard tape
  • Printed diagrams of your architecture
  • Markers and pens
  • A room (or even a hallway)

Any team can do this, from a three-person startup to a large SRE org.


Setting the Stage: How to Build Your Paper-Circuit Theater

Think of your simulation as a play. You’re going to:

  1. Design the set (your system and environment)
  2. Cast the roles (your incident team)
  3. Write the script beats (the incident scenario)
  4. Run the performance (the exercise)
  5. Hold a post-show retrospective (the debrief)

1. Design the set: Make your system physical

Use a conference room (or any open space) as your “production environment.”

  • Floor tape: Mark out areas for different subsystems or services: frontend, payments, auth, databases, message queue, etc.
  • Index cards: Each card is a resource or event:
    • Services: User Service, Order Service, Auth Service
    • Infrastructure: Postgres Cluster, Redis Cache, Kafka Topic
    • Events: Spike in 500s, CPU 95%, DB connection timeout, PagerDuty alert
  • Printed diagrams: Post a simplified architecture diagram on the wall. This anchors the space and helps participants orient themselves.

The goal is not perfect accuracy. The goal is to create a shared mental model of your system that participants can literally walk through.

2. Cast the roles

Assign roles explicitly, even if some people play multiple parts:

  • Incident Commander (IC) – Owns decisions and flow. Not necessarily the most senior engineer.
  • Operations / SRE – Investigate, propose mitigations, run technical actions (verbally, in the simulation).
  • Communications Lead – Owns updates to stakeholders, customers, or management.
  • Service Owners – Represent specific domains like Database, Networking, Payments, etc.
  • Observer / Facilitator – Guides the simulation, injects new events, tracks timing, and takes notes.

Give each person a card with their role description so they stay in character.

3. Write the scenario beats

You don’t need a full script; you need beats—key turning points in the incident.

Example: “Checkout Latency Spikes” scenario

  • Beat 1: Monitoring alarm triggers on checkout-latency (index card delivered to IC).
  • Beat 2: Support reports customers stuck at payment screen.
  • Beat 3: Auth service error rate climbs.
  • Beat 4: Database connections maxed out.
  • Beat 5: A deploy was just rolled out to the checkout service.
  • Beat 6: Rolling back helps, but errors persist from a downstream dependency.

For each beat, the facilitator has prepared cards with symptoms or data (think log snippets, metrics summaries, customer impact notes). They introduce them at timed intervals or in response to team actions.


Running the Show: A Screen-Free Incident Rehearsal

Here’s a simple structure for a 60–90 minute session.

Step 1: Briefing (10–15 minutes)

  • Explain the rules: No laptops, no real tools. Everything is represented via the room and cards.
  • Clarify goals: Practice communication, roles, decision-making—not memorizing commands.
  • Describe the system at a high level: Walk through the taped areas and diagrams.

Step 2: Incident starts (20–30 minutes)

The facilitator kicks things off:

  1. Trigger: Hand the IC an “Alert” card. The IC declares the incident and assigns roles if not pre-assigned.
  2. Communication practice: The Communications Lead must periodically deliver a “status update” to a mock stakeholder (could be another participant or the facilitator):
    • What we know
    • What we’re doing
    • Estimated impact
  3. Investigation:
    • Team members physically move between taped areas to represent “looking at” different parts of the system.
    • Facilitator hands them new cards with additional clues or complications as they investigate.

The team speaks their actions aloud, e.g., “I want to check the database for connection saturation” or “I’m rolling back the last deploy.” The facilitator responds with prepared outcomes or improvised consequences.

Step 3: Escalations, handoffs, tradeoffs (15–20 minutes)

Introduce more realistic challenges:

  • A key expert is “unavailable” for 10 minutes.
  • The IC’s shift ends and they must hand off to a new IC.
  • A stakeholder demands an ETA or suggests a risky quick fix.
  • The team must choose between partial mitigation now vs. deeper fix that takes longer.

These moments reveal process gaps:

  • Does anyone know how to change IC smoothly?
  • Who has the authority to stop a risky change?
  • Is there clarity on SLOs, error budgets, or acceptable risk?

Step 4: Resolution and wrap-up (5–10 minutes)

Once the team lands on an effective mitigation or fix, the facilitator declares the incident resolved.

The Communications Lead delivers a final summary: impact, timeline, mitigation, and next steps—mirroring the structure of a real incident report.


After the Curtain Falls: Debrief and Learning

The real value appears in the debrief.

Use questions like:

  • Where did communication work well? Where did it fail?
  • Were roles clear? Did anyone feel stuck or overloaded?
  • What information did we wish we had? Was that a monitoring gap or a documentation gap?
  • Which decisions felt hard? What would have made them easier?
  • If this had been a real incident, what would we change about our playbooks or escalation paths?

Translate findings into concrete improvements:

  • Update runbooks and incident playbooks.
  • Clarify ownership for critical services.
  • Add missing alerts or dashboards.
  • Define or refine roles and responsibilities for real on-call rotations.

Over time, running these tabletop-style paper simulations transforms them into a powerful feedback loop between:

  • System design (resilience, observability)
  • Team process (on-call, incident response)
  • Culture (psychological safety, learning from failure)

Why This Low-Tech Approach Actually Reduces Real On-Call Stress

When people know they’ve rehearsed scenarios—without the pressure of real customer impact—on-call stops feeling like a terrifying black box.

Paper-circuit rehearsals:

  • Build muscle memory for how to start, run, and close an incident.
  • Normalize asking for help and delegating roles.
  • Make it easier to speak up during real emergencies because you’ve already “performed” the script.
  • Reveal both technical failure modes (single points of failure, brittle dependencies) and workflow failure modes (confused escalation paths, unclear decision authority).

The next time the pager goes off, your team is less likely to panic. They’ve seen a version of this play before.


Getting Started: A Simple First Exercise

You don’t need to design a Hollywood-level production. Start small:

  1. Pick one common incident type (e.g., elevated error rates on your main API).
  2. Draw a very rough architecture on a whiteboard.
  3. Use index cards for:
    • Alerts (from monitoring, support, customers)
    • Symptoms (metrics, logs, traces summarized in plain language)
    • Dependencies (databases, external APIs, caches)
  4. Run a 45-minute session with:
    • 1 IC
    • 1 Communications Lead
    • 2–4 responders
    • 1 facilitator/observer
  5. Debrief for 20 minutes and pick one or two improvements to implement.

Repeat monthly. Vary the scenarios. Rotate roles. Over time, you build a culture where rehearsing incidents is normal, expected, and—dare we say—enjoyable.


Conclusion: Theater as a Tool for Reliability

Reliability isn’t just about better dashboards or more powerful automation. It’s about how humans coordinate under stress.

The Paper-Circuit On-Call Theater gives you a low-cost, repeatable, and surprisingly fun way to:

  • Practice incidents without risking production
  • Emphasize communication, clarity, and decision-making over tooling specifics
  • Uncover hidden gaps in monitoring, escalation, and ownership
  • Make on-call less mysterious and more humane

All you need is some floor tape, index cards, and a willingness to treat incident response as something you can rehearse—long before the curtain rises on the real thing.

The Paper-Circuit On-Call Theater: Rehearsing Production Incidents With Index Cards, Floor Tape, and Zero Screens | Rain Lag