The Analog Incident Diorama: Shoebox-Scale Replicas of Your Worst Outage

Introduction: When Postmortems Aren’t Enough

Most teams handle major outages and security incidents the same way: a hurried incident call, a frantic Slack channel, and then a slide deck postmortem that half the company never reads.

The result? We repeat the same mistakes. Diagrams stay abstract. Human factors get buried in bullet points. And new teammates have no visceral sense of how bad “that big incident” really was or how to react when it happens again.

Enter the Analog Incident Diorama: a shoebox-scale, physical reconstruction of your worst outage or security incident. Think of it as:

A low-tech, hands-on model of a high-tech failure
A storyboard of disaster, inspired by analog horror and tabletop RPGs
A collaborative training tool for engineers, security, support, and leadership

This isn’t arts and crafts for its own sake. It’s a way to make failure tangible, expose decision points and communication gaps, and turn your worst incident into a powerful learning artifact.

Why Build an Analog Incident Diorama?

A physical model forces different thinking than a confluence doc or sequence diagram. It:

Slows you down enough to really unpack what happened
Engages multiple senses, making the incident more memorable
Levels the playing field so non-technical folks can participate
Emphasizes narrative over just technical root causes

Instead of “the database failed,” you get:

At 09:42, the on-call SRE saw a vague alert. At 09:47, support was swamped with tickets. At 09:53, a misrouted Slack ping meant the database team didn’t see the incident until much later.

The diorama becomes a 3D storyboard that makes those moments impossible to ignore.

Step 1: Choose Your Incident (and Your Scope)

Pick one significant outage or security incident—ideally one that:

Involved multiple teams
Had messy human and communication factors
Felt confusing or chaotic in the moment

Then define scope:

Time window: e.g., from first alert to full recovery
Key actors: systems, teams, and external dependencies
Key decisions: where a choice or misunderstanding changed the path

You’re not trying to model the whole company. You’re building a focused, story-driven slice of reality.

Step 2: Gather Raw Material (Digital to Physical)

Collect artifacts from the real incident:

Alert timelines and dashboards
Chat logs and email threads
Call recordings or incident bridges
Ticket timelines (support, ops, security)
Post-incident report or root cause analysis

From this, identify:

Major events: things that changed system state
Observations: what different people saw and when
Decisions: who chose what, under which assumptions
Miscommunications: pings missed, channels ignored, unclear ownership

These will become scenes and props in your diorama.

Step 3: Build the Shoebox-Scale World

You don’t need artistic skills. You need symbolism and clarity.

Basic materials:

A shoebox or cardboard box (or several, one per system domain)
String, sticky notes, index cards
LEGO/figurines, paper cutouts, or simple blocks for services and people
Markers, tape, colored dots, yarn

Map out three core areas:

1. System Topology Layer

Use the floor of the box as a mini architecture diagram:

Blocks for services (API, DB, cache, auth provider)
Lines for connections and dependencies
A different color or shape for external services (cloud provider, payment gateway, IdP)

Add reliability modeling touches:

Redundant components: paired blocks with a shared label
Single points of failure (SPOFs): mark in red
Fallback paths: dashed lines to backups or degraded modes

2. Human & Organization Layer

On the walls or a second tier, represent people and teams:

Little figures or cards for on-call engineers, incident commander, support, security, product, leadership
Lines or yarn to represent communication paths (Slack, PagerDuty, email, phone)
Special markers for broken or delayed communication

3. Timeline & Storyboard Layer

Run a strip of paper or cards across the top edge or around the box:

Each card = a time-stamped event (09:41: first alert, 09:47: support flooded, 10:05: wrong rollback)
Connect events down into the box with string: which system changed? which person acted?

You now have a 3D storyboard of what happened, not just a system diagram.

Step 4: Turn It Into Analog Horror (Lightly)

You don’t need jump scares, but the analog horror aesthetic is useful: slow, creeping dread and a sense of inevitability.

You can:

Use lighting (a flashlight, phone light) to reveal scenes as you move along the timeline
Add visual foreshadowing: a red thread leading from a small, ignored alert to a future meltdown
Show spreading failure: green services turning red as the incident propagates

This framing helps the team feel the narrative tension of:

"We had multiple chances to notice and correct this, but we didn’t."

That feeling is exactly what drives better preparedness.

Step 5: Adapt Tabletop Exercise Techniques

Now that you have the physical model, use it like a tabletop exercise board:

Walk the timeline
- Move a pointer along the event cards
- Describe what each actor saw and believed at that moment
- Have the people who were actually there narrate their thinking
Pause at decision points
- Where did someone choose A over B?
- What info did they have? What was missing or misleading?
Ask "What if?" variations
- "What if this alert had gone to the right team?"
- "What if this failover had actually worked?"
- "What if support had a clearer runbook?"
Simulate alternative futures
- Move figures along a different path
- Change a dependency line (e.g., add a cache or circuit breaker)
- See which parts of the box still end in red

This turns your diorama into a safe sandbox for testing detection, communication, and recovery workflows.

Step 6: Focus on Decisions and Communication, Not Just Tech

Most postmortems over-index on the technical root cause. Your diorama should deliberately spotlight human and organizational factors:

Add explicit markers for:

Unclear ownership ("Who’s supposed to handle this alert?")
Role confusion (two incident commanders, or none)
Channel sprawl (five Slack channels, no single source of truth)
Escalation delays (critical teams looped in 30+ minutes late)
Cognitive overload (one person juggling logs, comms, customers)

Then ask, as you move through the model:

Where was critical information trapped in one person’s head or one channel?
Where did we optimize for speed over clarity, or vice versa, in harmful ways?
Where could a simple ritual (status updates every 10 minutes, a comms scribe) have helped?

Write these as sticky notes and place them directly on the relevant parts of the diorama.

Step 7: Integrate Reliability Modeling Concepts

Use the diorama to teach and test reliability thinking in concrete ways.

Annotate the model with:

Redundancy: Highlight services that truly have independent failover vs. those that appear redundant but share a hidden SPOF (same region, same credential store, same message queue).
Blast radius: Color-code services by impact—what fails silently vs. loudly? What takes customers down vs. creates degraded UX?
Failure modes: Mark different failure types (capacity, configuration, dependency, security breach, data corruption).
Detection vs. impact: Show visually which failures get detected quickly and which linger unnoticed.

Then run mini-scenarios:

"This region goes dark—walk me through the chain of effects."
"This credential gets leaked—what can the attacker actually reach?"
"This cache returns stale data—who notices, and how?"

You’re building shared mental models of reliability that stick far better than a PDF.

Step 8: Make It a Cross‑Functional Ritual

The real power comes when the diorama becomes a collaborative tool, not a one-time gimmick.

Invite:

Engineering (dev, SRE, platform)
Security
Customer support / success
Product and program managers
Incident managers or leadership

Use the session to:

Ask, "What went wrong?" from each perspective
Ask, "What would we do differently next time?" and capture concrete changes
Identify training gaps (on-call readiness, tooling knowledge)
Turn insights into tickets, runbooks, and playbook updates

Leave the diorama somewhere visible (war room, team area, or photographed and documented) as a living artifact of learning.

Step 9: Repeat With New Scenarios

Don’t stop at one.

Make analog incident dioramas a periodic exercise:

Rebuild the same incident after major architecture or process changes
Model new, hypothetical scenarios:
- Major cloud provider outage
- Ransomware attack
- Compromised CI/CD pipeline
- Region-wide network partition
Compare "old world vs. new world" models to see if changes actually reduce risk

Over time, you build a physical library of near-misses and disasters, and a culture that treats them as raw material for improvement rather than embarrassment.

Conclusion: Turning Pain Into Practice

A shoebox full of yarn and paper won’t fix your systems. But it will:

Expose how outages really unfold, beyond a neat root cause statement
Make invisible dependencies and SPOFs impossible to ignore
Highlight decision points, communication paths, and human constraints
Give teams a safe way to practice responses to complex, messy failure

Digital tools are great for real-time response. But for reflection, teaching, and building shared understanding, analog can be surprisingly powerful.

Your worst outage is already in the past. Turn it into a shoebox-scale replica that helps ensure the next incident is shorter, clearer, and far less painful—for your systems, your teams, and your customers.