The Analog Incident Diorama: Shoebox-Scale Replicas of Your Worst Outage
How to turn your worst production outage or security incident into a physical shoebox diorama that helps teams understand failure, improve incident response, and build more resilient systems.
Introduction: When Postmortems Aren’t Enough
Most teams handle major outages and security incidents the same way: a hurried incident call, a frantic Slack channel, and then a slide deck postmortem that half the company never reads.
The result? We repeat the same mistakes. Diagrams stay abstract. Human factors get buried in bullet points. And new teammates have no visceral sense of how bad “that big incident” really was or how to react when it happens again.
Enter the Analog Incident Diorama: a shoebox-scale, physical reconstruction of your worst outage or security incident. Think of it as:
- A low-tech, hands-on model of a high-tech failure
- A storyboard of disaster, inspired by analog horror and tabletop RPGs
- A collaborative training tool for engineers, security, support, and leadership
This isn’t arts and crafts for its own sake. It’s a way to make failure tangible, expose decision points and communication gaps, and turn your worst incident into a powerful learning artifact.
Why Build an Analog Incident Diorama?
A physical model forces different thinking than a confluence doc or sequence diagram. It:
- Slows you down enough to really unpack what happened
- Engages multiple senses, making the incident more memorable
- Levels the playing field so non-technical folks can participate
- Emphasizes narrative over just technical root causes
Instead of “the database failed,” you get:
At 09:42, the on-call SRE saw a vague alert. At 09:47, support was swamped with tickets. At 09:53, a misrouted Slack ping meant the database team didn’t see the incident until much later.
The diorama becomes a 3D storyboard that makes those moments impossible to ignore.
Step 1: Choose Your Incident (and Your Scope)
Pick one significant outage or security incident—ideally one that:
- Involved multiple teams
- Had messy human and communication factors
- Felt confusing or chaotic in the moment
Then define scope:
- Time window: e.g., from first alert to full recovery
- Key actors: systems, teams, and external dependencies
- Key decisions: where a choice or misunderstanding changed the path
You’re not trying to model the whole company. You’re building a focused, story-driven slice of reality.
Step 2: Gather Raw Material (Digital to Physical)
Collect artifacts from the real incident:
- Alert timelines and dashboards
- Chat logs and email threads
- Call recordings or incident bridges
- Ticket timelines (support, ops, security)
- Post-incident report or root cause analysis
From this, identify:
- Major events: things that changed system state
- Observations: what different people saw and when
- Decisions: who chose what, under which assumptions
- Miscommunications: pings missed, channels ignored, unclear ownership
These will become scenes and props in your diorama.
Step 3: Build the Shoebox-Scale World
You don’t need artistic skills. You need symbolism and clarity.
Basic materials:
- A shoebox or cardboard box (or several, one per system domain)
- String, sticky notes, index cards
- LEGO/figurines, paper cutouts, or simple blocks for services and people
- Markers, tape, colored dots, yarn
Map out three core areas:
1. System Topology Layer
Use the floor of the box as a mini architecture diagram:
- Blocks for services (API, DB, cache, auth provider)
- Lines for connections and dependencies
- A different color or shape for external services (cloud provider, payment gateway, IdP)
Add reliability modeling touches:
- Redundant components: paired blocks with a shared label
- Single points of failure (SPOFs): mark in red
- Fallback paths: dashed lines to backups or degraded modes
2. Human & Organization Layer
On the walls or a second tier, represent people and teams:
- Little figures or cards for on-call engineers, incident commander, support, security, product, leadership
- Lines or yarn to represent communication paths (Slack, PagerDuty, email, phone)
- Special markers for broken or delayed communication
3. Timeline & Storyboard Layer
Run a strip of paper or cards across the top edge or around the box:
- Each card = a time-stamped event (09:41: first alert, 09:47: support flooded, 10:05: wrong rollback)
- Connect events down into the box with string: which system changed? which person acted?
You now have a 3D storyboard of what happened, not just a system diagram.
Step 4: Turn It Into Analog Horror (Lightly)
You don’t need jump scares, but the analog horror aesthetic is useful: slow, creeping dread and a sense of inevitability.
You can:
- Use lighting (a flashlight, phone light) to reveal scenes as you move along the timeline
- Add visual foreshadowing: a red thread leading from a small, ignored alert to a future meltdown
- Show spreading failure: green services turning red as the incident propagates
This framing helps the team feel the narrative tension of:
"We had multiple chances to notice and correct this, but we didn’t."
That feeling is exactly what drives better preparedness.
Step 5: Adapt Tabletop Exercise Techniques
Now that you have the physical model, use it like a tabletop exercise board:
-
Walk the timeline
- Move a pointer along the event cards
- Describe what each actor saw and believed at that moment
- Have the people who were actually there narrate their thinking
-
Pause at decision points
- Where did someone choose A over B?
- What info did they have? What was missing or misleading?
-
Ask "What if?" variations
- "What if this alert had gone to the right team?"
- "What if this failover had actually worked?"
- "What if support had a clearer runbook?"
-
Simulate alternative futures
- Move figures along a different path
- Change a dependency line (e.g., add a cache or circuit breaker)
- See which parts of the box still end in red
This turns your diorama into a safe sandbox for testing detection, communication, and recovery workflows.
Step 6: Focus on Decisions and Communication, Not Just Tech
Most postmortems over-index on the technical root cause. Your diorama should deliberately spotlight human and organizational factors:
Add explicit markers for:
- Unclear ownership ("Who’s supposed to handle this alert?")
- Role confusion (two incident commanders, or none)
- Channel sprawl (five Slack channels, no single source of truth)
- Escalation delays (critical teams looped in 30+ minutes late)
- Cognitive overload (one person juggling logs, comms, customers)
Then ask, as you move through the model:
- Where was critical information trapped in one person’s head or one channel?
- Where did we optimize for speed over clarity, or vice versa, in harmful ways?
- Where could a simple ritual (status updates every 10 minutes, a comms scribe) have helped?
Write these as sticky notes and place them directly on the relevant parts of the diorama.
Step 7: Integrate Reliability Modeling Concepts
Use the diorama to teach and test reliability thinking in concrete ways.
Annotate the model with:
- Redundancy: Highlight services that truly have independent failover vs. those that appear redundant but share a hidden SPOF (same region, same credential store, same message queue).
- Blast radius: Color-code services by impact—what fails silently vs. loudly? What takes customers down vs. creates degraded UX?
- Failure modes: Mark different failure types (capacity, configuration, dependency, security breach, data corruption).
- Detection vs. impact: Show visually which failures get detected quickly and which linger unnoticed.
Then run mini-scenarios:
- "This region goes dark—walk me through the chain of effects."
- "This credential gets leaked—what can the attacker actually reach?"
- "This cache returns stale data—who notices, and how?"
You’re building shared mental models of reliability that stick far better than a PDF.
Step 8: Make It a Cross‑Functional Ritual
The real power comes when the diorama becomes a collaborative tool, not a one-time gimmick.
Invite:
- Engineering (dev, SRE, platform)
- Security
- Customer support / success
- Product and program managers
- Incident managers or leadership
Use the session to:
- Ask, "What went wrong?" from each perspective
- Ask, "What would we do differently next time?" and capture concrete changes
- Identify training gaps (on-call readiness, tooling knowledge)
- Turn insights into tickets, runbooks, and playbook updates
Leave the diorama somewhere visible (war room, team area, or photographed and documented) as a living artifact of learning.
Step 9: Repeat With New Scenarios
Don’t stop at one.
Make analog incident dioramas a periodic exercise:
- Rebuild the same incident after major architecture or process changes
- Model new, hypothetical scenarios:
- Major cloud provider outage
- Ransomware attack
- Compromised CI/CD pipeline
- Region-wide network partition
- Compare "old world vs. new world" models to see if changes actually reduce risk
Over time, you build a physical library of near-misses and disasters, and a culture that treats them as raw material for improvement rather than embarrassment.
Conclusion: Turning Pain Into Practice
A shoebox full of yarn and paper won’t fix your systems. But it will:
- Expose how outages really unfold, beyond a neat root cause statement
- Make invisible dependencies and SPOFs impossible to ignore
- Highlight decision points, communication paths, and human constraints
- Give teams a safe way to practice responses to complex, messy failure
Digital tools are great for real-time response. But for reflection, teaching, and building shared understanding, analog can be surprisingly powerful.
Your worst outage is already in the past. Turn it into a shoebox-scale replica that helps ensure the next incident is shorter, clearer, and far less painful—for your systems, your teams, and your customers.