Rain Lag

The Analog Incident Shadowbox: Building a Paper Diorama of Your Next Outage Before It Happens

How paper dioramas, pre‑mortems, and low‑tech “shadowboxes” can help engineering and security teams visualize failures, stress‑test assumptions, and dramatically improve incident readiness before anything breaks.

The Analog Incident Shadowbox: Building a Paper Diorama of Your Next Outage Before It Happens

Modern systems fail in deeply modern ways: distributed, opaque, and at high speed. But one of the most powerful tools for understanding those failures is decidedly low‑tech: paper, markers, tape, and a room full of humans.

This post explores the idea of an “analog incident shadowbox”—a physical, paper diorama of a hypothetical outage that you build before it happens. Paired with pre‑mortems and tabletop simulations, this simple practice can radically improve how your team anticipates, prevents, and responds to real incidents.


What Is a Pre‑Mortem (and Why It Beats Post‑Mortems Alone)

Most teams are familiar with post‑mortems: an incident happens, everyone scrambles, and afterward you analyze what went wrong. That’s important, but it’s also reactive—learning after you’ve already paid the price.

A pre‑mortem flips the script:

  • You assume a near‑future failure has already occurred (e.g., “It’s three months from now and our primary data center has been offline for 12 hours”).
  • You work backward to ask, “What must have gone wrong for us to end up here?”

This forward‑looking analysis helps teams:

  • Proactively identify threats and weaknesses that aren’t obvious in day‑to‑day work.
  • Expose hidden dependencies and vulnerabilities in architecture and processes.
  • Sharpen planning by challenging overconfident assumptions (“Of course failover will just work”).

Pre‑mortems are powerful by themselves. But they become far more vivid—and memorable—when you make them physical.


From Whiteboard to Shadowbox: Why Make It Physical?

We’re used to diagramming systems on whiteboards or in tools like Miro and Lucidchart. Those are useful, but they often stay abstract. An incident shadowbox turns a hypothetical outage into something you can literally see and move around.

Think of it as a paper diorama of your next outage:

  • Each service, system, and external dependency is a card or cut‑out.
  • Arrows and strings show data flows, trust relationships, and failure paths.
  • Sticky notes capture events, decisions, and consequences over time.

Why this works so well:

  1. Tangibility reveals complexity
    When you have to physically place every component, you quickly see where things are dense, brittle, or unclear.

  2. Narrative clarity
    You’re not just drawing a system diagram; you’re telling a story of failure—who notices first, what breaks next, where the blast radius spreads.

  3. Shared mental model
    A physical shadowbox is accessible to engineers, security, ops, product, support, and leadership. Everyone can point at the same thing, literally.

  4. Low‑tech, high‑signal
    Paper, tape, and markers create just enough friction to force intentionality. You can’t hide complexity behind another layer or dropdown.


How to Build an Analog Incident Shadowbox

You don’t need much:

  • Large wall or whiteboard
  • Index cards or paper squares
  • Markers, string, tape, and sticky notes

Then follow a simple process.

1. Choose Your Hypothetical Disaster

Start with a clear, vivid scenario. Examples:

  • “Our primary database cluster is irrecoverably corrupted at 2 a.m.”
  • “A stolen OAuth token leads to a major data exfiltration event.”
  • “A misconfigured routing change isolates our EU region for six hours.”

Frame it as if it already happened:

“It’s September 15th. We’ve been completely down for 10 hours. Customers are furious. Regulators are asking questions. What happened?”

2. Map the System as Cards and Flows

Lay out key components as cards:

  • Core services (API gateway, auth service, payment processor)
  • Datastores and caches
  • Third‑party dependencies (CDN, identity provider, payment gateway)
  • Monitoring, logging, and alerting systems
  • People and roles (on‑call engineer, incident commander, security lead, customer support)

Use arrows or string to show:

  • Data flows (who talks to whom)
  • Trust boundaries (where security assumptions change)
  • Single points of failure (one card that everything flows through)

3. Run a Pre‑Mortem Through the Shadowbox

Now walk through the scenario in time order:

  1. Trigger: What is the first thing that goes wrong? Move or flip a card to show it failing.
  2. Propagation: What fails next because of that? Break arrows, add sticky notes for new errors.
  3. Detection: Who notices first? Where does the first alert fire (if at all)?
  4. Response: What does the on‑call do? Where do they look? What tools don’t behave as expected?
  5. Escalation: Who gets pulled in? Which teams are now in the story?
  6. Customer impact: Where does the outage surface externally? API errors? Slow dashboards? Data inconsistency?

At each step, annotate the shadowbox with sticky notes:

  • Assumptions (“We assume the secondary region auto‑promotes within 5 minutes.”)
  • Questions (“Do we have runbooks for this failover path?”)
  • Gaps (“We don’t log this event anywhere central.”)

You’re not just solving the incident—you’re discovering how it could happen at all.


Turn It into a Tabletop Simulation

Once your diorama exists, you’re ready for a tabletop‑style exercise. These are common in cybersecurity and incident response: you simulate a scenario in a risk‑free environment and practice how you’d react.

Use the shadowbox as your tabletop board:

  • Facilitator: Guides the scenario, reveals new events (“Now you discover that backups are also corrupted”).
  • Participants: On‑call engineers, SREs, security, product, customer support, and relevant leadership.
  • Artifacts: Runbooks, dashboards, escalation paths, incident management tools—anything you’d actually use.

Benefits of a Shadowbox Tabletop

  1. Safe practice, real pressure
    People can make mistakes without harming customers, but still feel the tension and ambiguity of a real incident.

  2. Expose security and resilience gaps
    You’ll discover:

    • Missing alerts and runbooks
    • Unmonitored critical paths
    • Overreliance on specific individuals (“We just ping Alex; they always know where the logs are”)
  3. Improve cross‑functional coordination
    With multiple disciplines in the room, you see where:

    • Security and ops assumptions diverge
    • Product and support don’t get timely or clear information
    • Leadership joins too early or too late
  4. Refine procedures before they’re needed
    You can update incident runbooks, escalation trees, and communication templates while everyone still clearly remembers the simulation.


What You’ll Learn (That You Won’t See on a Dashboard)

Shadowbox exercises tend to surface quiet but dangerous realities:

  • Unknown single points of failure: That one queue, service, or human who holds everything together.
  • Hidden coupling: Two systems that “aren’t related” but always break together in the scenario.
  • Missing observability: Entire flows where you have no logs, no metrics, and no alerts.
  • Process fragility: Critical steps that exist only in someone’s memory or a forgotten internal doc.
  • Communication bottlenecks: Channels that get noisy, or stakeholders who never hear what they need.

These insights are far easier and cheaper to act on before an actual outage.


Turning Hypothetical Failures into Real Improvements

The point of all this is not to create an art project; it’s to drive concrete change. After the session, capture and prioritize:

  1. Design and architecture improvements

    • Add redundancy for true single points of failure.
    • Simplify or decouple tightly knit services.
    • Re‑evaluate trust boundaries and least‑privilege access.
  2. Process and documentation upgrades

    • Create or update runbooks for the tested scenarios.
    • Tighten incident command structure and escalation rules.
    • Improve customer and internal communication templates.
  3. Tooling and observability enhancements

    • Add or refine alerts for early signals of the scenario.
    • Improve dashboards and logs aligned with the failure story.
    • Automate repetitive or slow manual steps uncovered in the exercise.
  4. Training and readiness

    • Rotate different people into the incident commander role.
    • Use results to focus on‑call training on the riskiest areas.
    • Repeat shadowbox drills for other plausible disasters.

Over time, these changes reduce both the likelihood and the impact of real incidents.


Making This a Habit (Not a One‑Off)

To embed this practice into your culture:

  • Run a shadowbox pre‑mortem before major launches or migrations.
  • Schedule quarterly tabletop simulations for your top risk scenarios.
  • Vary participants so newer team members gain experience and confidence.
  • Keep the best shadowboxes photographed or partially intact as teaching tools for onboarding.

The investment is small—usually a couple of hours and a few supplies—but the payoff in resilience and coordination is significant.


Conclusion: Draw the Crash Before You Drive Faster

High‑performing teams don’t just get better at cleaning up incidents; they get better at seeing them coming. An analog incident shadowbox turns abstract risk into a shared, concrete story your whole organization can understand.

By combining:

  • Pre‑mortems that assume failure has already happened,
  • Paper dioramas that visualize cascading effects, and
  • Tabletop simulations that let teams practice safely,

you can find and fix weaknesses in your systems and processes before real customers are affected.

Sometimes the most powerful way to understand complex, digital failure is to spread paper on a wall, gather the right people, and ask: “If this went catastrophically wrong, what would the story look like?”

Then build that story—with scissors and tape—until you know how to change the ending.

The Analog Incident Shadowbox: Building a Paper Diorama of Your Next Outage Before It Happens | Rain Lag