The Analog Incident Story Aquarium: Designing a Glass‑Walled Desk Tank Where Outages Swim in Slow Motion

Introduction: When Outages Become Creatures

Most teams treat outages like jump scares: sudden flashes of alerts, frantic scrambling, and then—once systems are back up—everyone tries to forget. The postmortem slides land in a shared drive, gather digital dust, and the next incident feels just as chaotic as the last.

But what if incidents weren’t jump scares at all? What if they were slow, uncanny creatures you could study—safely—through glass?

Imagine a glass‑walled desk tank on the corner of your monitor. Inside it, past outages drift in slow motion, each one a distinct organism with odd behaviors, strange life cycles, and specific environmental triggers. You can tap the glass, take notes, watch their movements. They never again get to ambush you; they live here now, archived and observable.

Welcome to the Analog Incident Story Aquarium: a metaphorical, always‑on exhibit where your outages are preserved as specimens, your responders become trained naturalists, and your organization learns in public—without getting bitten.

The Aquarium: A Glass Wall Between You and Chaos

The core idea is simple: outages are too important to forget and too costly to treat as one‑off anomalies. The aquarium is your team’s shared mental model of:

A safe boundary: incidents are dangerous in production, but in the aquarium they’re slowed down and contained.
A permanent exhibit: nothing vanishes after the RCA; it joins the collection.
A learning interface: you peer through the glass to understand how these creatures behave.

Where a typical postmortem reads like a police report, an aquarium specimen is more like a field guide entry:

What does this outage look like when it first appears?
How does it move through your system? Fast spike? Slow leak?
Under what conditions does it emerge from the murk?
How did it ultimately calm down or die out?

This reframing matters. When engineers can see outages as recurring ecological patterns instead of random horror, they’re better equipped to recognize early signs and apply learned responses.

Analog Horror Aesthetics: Making Incidents Memorable

Incidents are often described with dry language: 500 errors increased, DB CPU high, rollback applied. Useful, but forgettable.

Analog horror—think the "Salem Watertower Incident" VHS style, distorted tapes, eerie timestamps—shows how unsettling context plus texture can be. You can borrow this aesthetic to make incidents uncanny enough to remember, without trivializing their impact.

Consider giving each major outage an analog horror card in your aquarium:

A title: “The Night of the Silent Queue” instead of “Message Bus Latency 2024‑03‑14”.
A still frame: a graph screenshot with glitch‑like annotations, timestamp burned in the corner.
A tagline: one unsettling line that captures the vibe: “Everything said it was healthy. Nothing moved.”

This isn’t about dramatizing suffering; it’s about anchoring the memory. People remember stories, aesthetics, and vibes far more readily than ticket numbers. An outage that feels like a distinct creature is easier to recognize when it starts swimming toward you again.

Specimens in the Tank: Outages as Creatures

In the analog incident aquarium, each notable outage becomes a specimen with a standardized profile.

1. Taxonomy: What Kind of Creature Is This?

Classify your incidents like you’re building a field guide:

Genus: Latency, Availability, Data Integrity, Performance, Security
Species:
- Burrower (silent data corruption)
- Surface Breacher (sudden public outage)
- Reef Choker (resource exhaustion)
- Shadow Swimmer (intermittent, hard to reproduce)

The point isn’t scientific precision; it’s shared language. Saying “We’ve seen this Reef Choker before” is shorthand for a whole class of resource‑related meltdowns.

2. Behavior: How Does It Move?

Each specimen entry answers:

Onset pattern: sudden spike, stepwise climb, slow drift?
Spread: which services were affected in what order?
Signals: what did logs, metrics, traces, and human reports show?
Escape attempts: what made it worse before it got better?

Think time‑lapse nature documentary: you’re reconstructing a creature’s behavior frame by frame.

3. Habitat and Trigger Conditions

Every incident lives in a specific habitat:

Which environment? (prod, staging, a specific region)
Which dependencies? (DBs, queues, external APIs)
Which business events? (Black Friday, product launch, billing run)

This makes it easier to ask: Are we recreating the conditions for this thing to reappear? If so, the tank warns you before history repeats.

Incident Response as a Formal Discipline (Not Just "Heroic Debugging")

An aquarium is useless if nobody knows what to do when a new creature escapes into production. This is where treating incident response as a formal discipline comes in.

In the real world, emergency response is structured. Firefighters, EMTs, and police don’t all shout at once; they operate under frameworks like the National Incident Management System (NIMS) and Incident Command System (ICS):

Clear roles and responsibilities
Defined chains of command
Standard communication patterns

Borrow that structure for your digital world.

Borrowed ICS Roles for Your Incident Aquarium

In your incident response playbook, map ICS concepts to software:

Incident Commander (IC): owns overall response; keeps the big picture, makes calls, manages escalation.
Operations Lead: coordinates hands‑on mitigation work: rollbacks, feature flags, scaling, failover.
Planning Lead: tracks hypotheses, timelines, and decisions; keeps a real‑time log; plans next steps.
Communications Lead: posts updates to internal channels, status pages, and stakeholders.
Liaison / Customer Lead: represents customer impact and prioritizes mitigations that reduce pain fastest.

In the aquarium metaphor, these are the aquarists and keepers. When a creature breaks the glass, everyone knows their job. After the fact, the aquarium entry records not just what the creature did, but how the crew moved around it.

Analog vs. Digital Light Meters: Sensing "Something’s Off"

Photographers once relied on analog light meters—needles swinging gently to indicate exposure. You’d develop a feel: this scene looks like 1/125 at f/8. Today, cameras use fast, precise digital metering, but seasoned photographers still “feel” when the image will be wrong.

Incidents work the same way.

Analog sensing: Senior engineers feel that something’s off—the latency graph looks “wrong,” the deploy feels risky, the business chatter doesn’t match the dashboards. This is instinct, pattern matching, and experience.
Digital sensing: Metrics, SLOs, traces, and logs provide precise, quantitative signals. Alerts fire on numerical thresholds.

In a healthy organization, you want both:

Encourage engineers to voice analog concerns:
- “This shape of traffic feels like the last time we had a queue backup.”
- “This dashboard is quiet, but support is noisier than usual. Something’s lurking.”
Use the aquarium to align analog and digital:
- Each specimen entry includes both:
  - What did people feel or notice subjectively?
  - What did the metrics say objectively?

Over time, your team learns to calibrate their internal “light meters” against telemetry. Outages stop being pure surprises; they start as faint movements at the edge of the tank.

Designing Your Own Incident Aquarium

You don’t need a literal glass tank on your desk (though that would be fun). You need a shared, visible, and story‑rich space where incidents live.

1. Pick Your Tank

Options:

A dedicated section in your internal docs or wiki
A Notion board, Miro board, or digital whiteboard
A static microsite hosted internally: /incident-aquarium

The key: it must feel like an exhibit, not just a folder of PDFs.

2. Define a Specimen Template

For every significant outage, create an entry with:

Name & Tagline
Date & Duration
Type/Taxonomy (e.g., Shadow Swimmer – intermittent latency)
Symptoms & Behavior (stepwise narrative in slow motion)
Habitat & Trigger Conditions
Response Roles & Timeline (who did what, when)
Analog Signals (what felt off, what humans noticed first)
Digital Signals (key metrics, dashboards, logs, traces)
Mitigations & Long‑Term Changes
What We Learned (3–5 crisp bullets)

Make it visually distinct: screenshots, charts, annotated timelines, even faux VHS overlays if your culture allows.

3. Make Visits a Ritual

An aquarium is meant to be visited. Build:

Monthly “Tank Tours”: 30 minutes to revisit one or two specimens. Ask: Could we spot this earlier now?
Onboarding Walkthroughs: new engineers take a guided tour of the “Greatest Hits” outages.
Pre‑Launch Reviews: before big launches, ask: Which tank creatures might this attract?

This keeps the creatures alive in memory—but behind glass.

Continuous Learning: From Jump Scares to Nature Documentaries

The real value of the analog incident aquarium isn’t the aesthetics; it’s the shift in posture.

Outages stop being isolated shocks and become part of a living ecosystem of risk.
Incident response shifts from hero mode to coordinated, ICS‑inspired practice.
Engineers calibrate their analog instincts with digital telemetry, getting better at early detection.
The organization builds a shared language and memory around failure.

Instead of trying to outrun your outages, you’re building a transparent tank around them—observing, cataloging, and learning. The creatures don’t go away. But they do become less mysterious, less terrifying, and far more instructive.

Put the glass wall on your desk, even if it’s only in your mind and your wiki. Give your incidents names. Give them faces. Let them swim in slow motion where everyone can see.

And the next time the water starts to ripple in production, you’ll recognize the shape in the tank—and know what to do before it reaches the glass.