The Analog Incident Story Aquarium Shelf: Building a Stacked-Paper Skyline for Multi‑Layered Outages

Introduction

In a world obsessed with dashboards, digital twins, and AI-driven observability, there’s something quietly radical about going back to paper.

Imagine walking into your team space and seeing an entire shelf of incident stories rendered as a skyline: stacks of paper, each stack a single outage, layered from top to bottom with the decisions, signals, dependencies, and organizational factors that shaped it. This is the Analog Incident Story Aquarium Shelf—a physical, stacked-paper visualization of complex, multi-layered outages.

It looks a bit like an aquarium of stories or a cityscape of incidents. But it’s more than decor. It’s a tool: to teach layered thinking, reveal cascading dependencies, and make reliability a shared, tangible responsibility.

In this post, we’ll explore how this analog “story aquarium” works, why multi-layered analysis cuts downtime by 30–55%, and how ideas borrowed from failure modeling, low-tech games, and even battery disassembly lines can transform how your team learns from incidents.

Why Multi-Layered Outage Analysis Matters

Many post-mortems stop at the first or second “why.” A bad configuration was deployed. A cache was mis-sized. A failover didn’t trigger.

But outages are rarely single-cause events. They’re multi-layered stories:

A technical symptom on the surface
Sitting on top of hidden dependency chains
Shaped by processes, tools, and communication patterns
Influenced by organizational structures and incentives

Teams that formally incorporate a structured, four-layer analysis model into incident reviews have reported 30–55% reductions in downtime. The mechanism isn’t magic—it’s mindset:

System layer – What failed technically?
Dependency layer – How did upstream and downstream services interact?
Process layer – What procedures, runbooks, or workflows shaped the response?
Organizational layer – What roles, incentives, and communication structures mattered?

When you train people to see all four layers, they stop “fixing the symptom” and start reshaping the system.

The Analog Incident Story Aquarium Shelf is a way to make those layers visible and inspectable, using nothing more than paper, pens, and shelves.

What Is the Incident Story Aquarium Shelf?

At its core, the Story Aquarium is a physical, 3D representation of incident stories:

Each stack of paper = one incident.
Each layer in the stack = one perspective or level of analysis.
Rows of stacks on a shelf = your landscape of incidents over time.

You can think of it as a stacked-paper skyline:

Tall stacks: complex, multi-layer analyses.
Short stacks: shallowly understood incidents.
Gaps in the skyline: blind spots in your review process.

Each layer might correspond to your four-layer model:

Top sheet – Narrative & impact
- Short story: what happened, when, and to whom.
- Customer impact, error budgets, timeline highlights.
Second layer – Technical fault & signals
- Key metrics, logs, and failure modes.
- Fault propagation within the system boundaries.
Third layer – Dependencies & propagation
- Diagrams of upstream/downstream services.
- Notation of where and how the failure spread.
Bottom layer – Process & organizational context
- Hand-offs, escalations, playbook usage.
- Staffing, incentives, communication frictions.

Slide the stack out of its slot, and you have a mini-incident book you can read from top to bottom.

From Digital Modeling to Analog Story Stacks

Modern resilience tools—like Smart TS XL and other scenario-modeling frameworks—simulate how failures cascade through complex architectures. They:

Map dependencies
Model propagation chains
Help teams test modernization plans without risking production

The Story Aquarium is the analog mirror of this idea.

Instead of a synthetic model:

You use real incidents as your data.
You map actual propagation paths on paper.
You visualize before-and-after states (e.g., “What we thought dependencies looked like vs. what actually happened”).

By treating each incident as a small case study in cascading failure, your shelves become a library of pre-modeled risk chains. When considering a new deployment or architecture change, you can:

Pull related incident stacks.
Scan the dependency layers.
Ask: “Are we about to recreate this failure, just in a different place?”

This physical ritual encourages proactive thinking: you’re not waiting for the next outage; you’re learning from the previous stack of stories.

Designing the Four-Layer Paper Model

A practical pattern for your stacked-paper skyline might look like this:

Layer 1: The Story Card (Top)

One page, large font.
Brief, human-readable incident narrative.
A simple “comic strip” timeline: key events with times.
Fields: Summary, Impact, Customer Perspective.

Layer 2: System & Signals

Architecture sketch: boxes and arrows, no more.
Mark the initial failure point.
Note the primary observable signals: metrics, logs, alerts.
Fields: Entry point of failure, Health signals, Detection path.

Layer 3: Dependencies & Propagation

Draw how the failure moved: arrows showing each hop.
Call out hidden or “unknown” dependencies discovered.
Note any cascading disruptions (e.g., retries, thundering herds).
Fields: Dependencies we knew, Dependencies we learned, Propagation chain.

Layer 4: Process & Organization

Who got paged first? Who actually fixed it?
Which runbooks or processes were used—or missing?
Where did communication stall or accelerate recovery?
Fields: Process gaps, Misaligned incentives, Team interactions.

Optional deeper layers can cover:

Risk & controls (what guardrails existed or failed)
Remediation status (committed vs. completed work)
Learning objectives (what this incident will teach newcomers)

As you adopt this pattern consistently, the shelf becomes a coherent story architecture rather than a random archive.

What Battery Disassembly Can Teach Incident Analysis

Automated battery disassembly frameworks in manufacturing offer a surprising analogy. They:

Break down complex objects into clear process chains and defined steps.
Ensure each teardown is safe, repeatable, and inspectable.

Apply that mindset to your incidents:

Treat each incident as something to systematically disassemble.
Standardize the steps: from raw timeline → layered analysis → shelf placement.
Make each layer a step in a teardown line: what, how, why, and who.

This clarity means:

New team members can follow a repeatable “disassembly” pattern.
Auditors or leadership can inspect both consistency and depth of analysis.
You normalize the idea that incidents are process artifacts, not personal failures.

You’re not hunting for a culprit; you’re tearing down a complex system event into analyzable layers.

Learning by Doing: From Paper Planes to Paper Skylines

Low-tech exercises like the classic paper-plane team challenge show how simple materials can teach sophisticated concepts like:

Iterative improvement
Feedback loops
Measuring and refining processes

A paper-based outage skyline extends this philosophy to SRE and operations:

Teams gather around a table with templates and markers.
They construct the incident stack layer by layer.
They physically move sheets, reorder them, and discuss.

This tactile, collaborative ritual:

Slows thinking down just enough to be deliberate.
Makes abstract concepts (dependencies, incentives, propagation) visible.
Lowers the barrier for cross-functional participation—anyone can read or annotate paper.

Just as throwing and redesigning paper planes makes process improvement visceral, building and revisiting paper incident stacks makes multi-layered outage analysis stick.

The Power of a Shared, Analog Artifact

Modern reliability work often suffers from fragmentation:

SREs live in dashboards.
Product managers live in docs and tickets.
Leadership sees summarized slides.

A shared shelf of stacked incident stories becomes a unifying artifact:

Engineers see how their alerts and runbooks played out.
PMs see customer narratives and timelines.
Leadership sees patterns in process and organization.

Benefits include:

Shared vocabulary: people start talking about “layers” rather than “root cause.”
Faster onboarding: newcomers can literally pull three incidents off the shelf and learn how the system fails.
Pattern discovery: recurring issues in certain layers (e.g., process or org) become visually obvious.

The Story Aquarium turns reliability from something hidden in logs and tools into a public, inspectable story architecture.

How to Get Started

You can pilot an Analog Incident Story Aquarium in a week:

Choose 3–5 recent incidents.
Define your four layers and create simple, one-page templates for each.
Run a workshop where a cross-functional group (SRE, dev, PM, support) fills in layers together.
Stack and label each incident and place them on a visible shelf.
Use them in rituals: pre-mortems, design reviews, and onboarding sessions.
Iterate: refine templates as you learn what surfaces meaningful patterns.

You don’t need perfect templates on day one. The act of iterating the format is itself an exercise in process improvement.

Conclusion

The Analog Incident Story Aquarium Shelf is not a replacement for your observability stack or modeling tools. It’s a complement—a way to translate complex, multi-layered outages into a tangible skyline of stories.

By combining:

A four-layer analysis model (which has been shown to cut downtime significantly),
The scenario mindset of tools like Smart TS XL,
The stepwise clarity of battery disassembly lines, and
The learning-by-doing spirit of paper-plane challenges,

you create a low-tech but high-impact system for teaching, sharing, and improving reliability.

In an era of increasingly opaque systems, a shelf of stacked paper can be a quietly powerful thing: a physical reminder that every outage is a story—and every story, carefully disassembled, is an opportunity to build a more resilient skyline.