Rain Lag

The Analog Incident Story Planetarium: Projecting Your Outage History Onto a Ceiling of Quiet Warnings

How to turn your incident history into a star‑filled, response‑centric planetarium that quietly teaches teams about resilience, interdependence, and better responses to future outages.

The Ceiling That Remembers: Why We Need an Incident Planetarium

Most organizations keep their outage history locked away in tools: ticketing systems, post‑incident docs, dashboards, and slide decks that only a handful of people ever revisit.

Imagine instead that your incident history lived above you.

Not in another dashboard tab, not buried in Confluence, but literally on the ceiling: a dark dome of your past outages turned into stars. Each star is a quiet warning: This happened. This is how it broke. This is how we recovered. Learn from me before you need me.

This is the idea behind the Analog Incident Story Planetarium—a physical, visual, response‑centric map of your operational past, designed not just to analyze incidents, but to teach from them.


Response‑Centric Taxonomy: Grouping by How You Fought the Fire

Most incident catalogs are organized by root cause: a missing null check here, a misconfigured firewall there, a dependency that ran out of quota. This is useful, but it tells only part of the story.

The planetarium starts with a response‑centric taxonomy instead. Incidents are grouped by:

  • How they were detected (customer reports, synthetic checks, logs, anomaly detection)
  • How they were contained (traffic shaping, feature flags, circuit breakers, rollbacks)
  • How they were resolved (config fix, code patch, infra scaling, vendor escalation)
  • What kept them from being worse (runbooks, chaos drills, canary releases, rate limiting)

By focusing on how we responded instead of just what failed, we reframe outages as training data for resilience, not just lists of mistakes.

In the planetarium, this taxonomy controls the constellations:

  • A Constellation of Rollbacks: incidents where reverting quickly saved the day.
  • A Cluster of Human-in-the-Loop Saves: outages where on‑call skill and improvisation were critical.
  • A Galaxy of Slow‑Burn Degradation: long, low‑grade incidents that taught you about observability gaps.

You’re not just staring at blame; you’re looking at your collective repertoire of responses.


Plotting Outages as Stars: A Precise, Visual Language

Each incident becomes a star on the ceiling. The placement isn’t random. It’s driven by data:

  • Position (x/y): encodes relationships, such as affected subsystems or response pattern family.
  • Brightness: reflects severity or impact (e.g., customer minutes affected, revenue at risk).
  • Color: might indicate primary failure mode (network, storage, deploy, configuration, dependency).
  • Halo or ring size: could represent time to detect or time to recover.
  • Twinned or binary stars: encode linked incidents (e.g., a major outage and its follow‑up regression).

Over time, the ceiling becomes a precise, spatial history of your operational life. A visitor can lie back, look up, and read the story visually:

  • A bright red cluster near the “edge”? High‑severity deployment failures around a new service.
  • A pale blue arc stretching across the room? Minor but frequent config mishaps in legacy components.

Every dot has a story, and the medium itself pushes you to remember that incidents are objects of study, not just past pain.


Quiet Warnings: Encoding Lessons in the Stars

If all you do is plot outages, you’ve built a pretty mural, not a learning tool. The magic is in turning each star into a quiet warning.

Each star can be annotated—physically or via an associated digital index—with:

  • Short narrative: “Black Friday 2022: checkout stalled for 18 minutes; temporary queueing and manual traffic throttling stabilized us.”
  • Key response tactic: e.g. “Rollback within 6 minutes; feature flag kill switch; manual failover.”
  • Lessons learned: “Add automatic rollback conditions; pre‑flight tests for config; better runbook for failover.”
  • Practice prompt: a short question: “If this started right now, what would you look at first?”

In a weekly review or onboarding session, someone points to a star and says, “Let’s tell the story of this one.” The ceiling becomes a memory palace of operational wisdom.

Over time, patterns emerge:

  • Stars representing “manual heroics” sit in one corner—reminding you where automation is still missing.
  • Stars representing “observability gaps” cluster in another—pointing to weak spots in monitoring.

The goal is not to shame past errors, but to normalize learning:

Everyone here breaks things. Everyone here learns. These are our stories.


Showing System Interdependence: The Sky of Cascades

Modern systems fail in networks, not in isolation. A failure in one subsystem fans out into others: timeouts pile up, retries storm, queues overflow, caches thrash.

The planetarium surface is perfect for making interdependence visible:

  • Lines between stars represent dependencies: a star in the payments region linked to one in the database galaxy.
  • Incident orbits show how one failure “pulled” another into motion.
  • Constellations of cascades visualize repeated chain reactions: “auth → API gateway → mobile clients”.

This gives your team a spatial intuition for systemic risk:

  • You notice that most high‑severity stars connect to your auth or storage constellations.
  • You realize a minor‑looking service sits at the intersection of many lines—quietly critical.

Standing under the ceiling, you can ask:

  • “If this service vanished, where would the ripples go?”
  • “Why do cascading outages always involve this queue system?”

Instead of learning these relationships only during a 3 a.m. incident, you accelerate the learning into daylight hours, in a calmer room.


Learning Design: Making Complexity Child‑Friendly

Teaching incident response often leans heavily on dense docs and jargon. The planetarium borrows from educational design—children’s science museums, star shows, and classroom charts—to make complexity digestible.

Clear Presentation Formats

  • Layers of detail:
    • At a distance, you just see patterns: bright zones, dense clusters, lonely outliers.
    • Up close (or with a companion app), you see incident numbers, durations, and timelines.
  • Consistent visual rules: same color always means the same failure mode; same brightness scale for severity.

Child‑Friendly Explanations

Every complex idea gets a version that a non‑engineer (or a child) could grasp:

  • Instead of “we experienced an availability regression due to circuit breaker misconfiguration”:
    • “Our safety switch didn’t work, so too many requests hit a broken part and everything jammed.”
  • Instead of “slow burn CPU saturation on a shared node pool”:
    • “A lot of small tasks crowded onto the same machines until they had no room to breathe.”

This doesn’t dumb things down; it opens them up. Product managers, support teams, leaders, and new hires can all stand under the same ceiling and understand enough to ask good questions.


Constellations, Clusters, and Galaxies of Failure Patterns

The astronomical metaphor isn’t just aesthetic—it’s structural.

  • Constellations: manually defined patterns you want everyone to recognize.
    • “Deployment Dragons”: incidents caused or fixed by deploys.
    • “Latency Serpents”: issues where response time quietly climbed for weeks.
  • Clusters: dense regions that emerge from the data.
    • A cluster of stars that all involve one particular message queue.
    • A knot of incidents in the same 2‑hour window after weekly deploys.
  • Galaxies: higher‑level families of incidents.
    • The Galaxy of External Dependencies: DNS, third‑party APIs, payment gateways.
    • The Galaxy of Internal Misconfigurations: config flags, IAM policies, timeouts.

When you introduce new teammates, you can literally point and say:

  • “That’s the galaxy of things we don’t fully control. We invest heavily in mitigations here.”
  • “This constellation? These were all fixed by the same playbook. Learn that one early.”

Patterns stop being abstract. They become places people can refer to, visit, and remember.


An Analytical Tool and a Reflective Space

The Analog Incident Story Planetarium is two things at once:

  1. Analytical Instrument

    • It encodes real metrics and data.
    • It helps you see clusters, recurring chains, and hotspots.
    • It surfaces where detection, containment, and recovery are strong or fragile.
  2. Reflective Room

    • A quiet space where teams can lie on the floor and look up.
    • A ritual setting for post‑incident reviews and quarterly retros.
    • A cultural artifact that says: “We honor our past incidents by learning from them.”

Teams can:

  • Do guided tours: once a month, walk through a couple of stars and constellations.
  • Run simulation sessions: point to a star and rehearse what they’d do if it happened again today.
  • Use it for onboarding: give new hires a 30‑minute “night sky” tour of your operational history.

The ceiling becomes both a mirror and a compass: reflecting what’s happened, pointing at what to practice next.


Building Your Own Incident Planetarium

You don’t need a custom‑built dome to start. You can approximate the approach in stages:

  1. Gather and re‑index incidents using a response‑centric taxonomy: detection, containment, resolution.
  2. Choose visual encodings for severity, type, and duration—then stick to them.
  3. Sketch a star map on paper or a whiteboard first: clusters, constellations, galaxies.
  4. Move to the ceiling:
    • Simple version: glow‑in‑the‑dark stickers or printed star charts.
    • Advanced version: a projector linked to an incident dataset.
  5. Add stories: short, readable narratives and practice prompts for each star.
  6. Use it regularly: retros, training, cross‑team reviews, leadership briefings.

The value doesn’t come from perfection, but from repetition: returning, looking up, and letting the past quietly shape how you respond in the future.


Conclusion: Learning to Read Your Own Night Sky

Every organization already has a night sky of incidents—hundreds of moments when things broke and people scrambled, learned, and improved. Most of that sky is invisible, scattered across tools and memories.

The Analog Incident Story Planetarium is a way to collect that sky into one place, to see your failures as stars instead of scars.

Under that ceiling of quiet warnings, teams can:

  • Notice the patterns that were always there.
  • Practice better responses before the next real outage.
  • Build a culture where learning from incidents is as normal as having them.

The incidents will keep coming. The question is whether they vanish into logs, or whether they light up a sky you can learn to read together.

The Analog Incident Story Planetarium: Projecting Your Outage History Onto a Ceiling of Quiet Warnings | Rain Lag