Rain Lag

The Analog Incident Story Observatory Dome: A Desk-Sized Planet for Watching Tiny Failures Orbit Your System

Imagine your software system as a miniature solar system, where tiny failures orbit like comets and planets. This post explores the “Incident Story Observatory Dome” — an analog, desk-sized planetarium for visualizing dependencies, incident trajectories, and how LLM-powered mission control can help you predict and steer incidents before they crash.

The Analog Incident Story Observatory Dome: A Desk-Sized Planet for Watching Tiny Failures Orbit Your System

What if you could put your entire system on your desk?

Not as a dashboard, not as a diagram in a wiki, but as a physical, glowing planet under a glass dome. Inside it: tiny lights, spinning paths, orbits of failure drifting around like satellites. Each orbit tells a story — how a small bug, a config tweak, or a partial outage took a specific route through your architecture.

This is the idea of the Incident Story Observatory Dome: a desk-sized planet for watching tiny failures orbit your system.

It’s a metaphor — but a powerful one. It reframes incidents not as random explosions, but as predictable, structured trajectories through a gravitational field of dependencies, feedback loops, and architectural choices.


Thinking in Orbits, Not Outages

We usually talk about incidents as discrete events:

  • “Redis went down.”
  • “The payment service timed out.”
  • “We had a partial outage in EU-West.”

In the Observatory Dome model, incidents are instead thought of as orbits of tiny failures.

Each failure — a retry storm, a cache miss spike, a misconfigured feature flag — becomes a small “body” entering your system’s gravitational field. Its path (the incident’s trajectory) reveals hidden structure:

  • Which services does it pass near first?
  • Which dependencies bend its path and amplify it?
  • Where does it pick up energy (feedback loops) or lose it (circuit breakers, bulkheads, throttling)?

Visualized as an orbit, an incident is no longer just “it broke”; it’s a path through relationships.


The Dome as an Analog Dependency Map

Most dependency maps live in PDFs or auto-generated graphs few people actually study. The Observatory Dome is a different stance: an analog visualization that makes dependencies feel physical.

Imagine the dome like a small planetarium of your system:

  • Services as bodies: Core services are massive planets; supporting services are moons; external APIs are distant, slow-moving objects.
  • Dependencies as gravity: A heavy dependency (like your primary database) exerts strong “gravitational pull” — almost every incident path bends around it.
  • Traffic and load as energy: High-traffic paths become busy orbital lanes; incident particles that enter these lanes accelerate or destabilize.

Within the dome, when a failure “particle” is introduced at some component:

  • You can see how it’s pulled into certain orbits.
  • You can see which bodies it’s likely to collide with.
  • You can see where collisions propagate more failures.

This shifts your mental model from boxes and arrows to force fields and trajectories. Dependencies are no longer static lines; they’re dynamic influences.


Borrowing from Space Trajectory Planning

Space missions are famously sensitive to small changes. A tiny delta-v at the right point can send a spacecraft to a completely different orbit.

Your incidents are no different.

Small perturbations in your system:

  • a minor config change,
  • a small performance regression,
  • a barely noticeable timeout increase,

can send failures onto totally different paths.

From a trajectory-planning perspective, an incident has:

  • Launch conditions: Where and how the failure starts (a deploy, a regional blip, a dependency degradation).
  • Transfer orbits: The sequence of services the failure affects as it moves through retries, queues, and fallbacks.
  • Gravity assists: Feedback loops (like cascading retries) that accelerate the incident’s impact.
  • Stability points: Regions where mitigation measures (circuit breakers, rate limiting, load shedding) absorb or deflect the failure.

If you map this in the dome, you start seeing non-linear routes:

  • A tiny bug in a logging library that, under certain traffic conditions, tips a CPU-bound service over the edge.
  • A small config tweak to retry thresholds that turns a transient dependency blip into a full-throttle retry storm.
  • A small latency increase in one region that causes traffic to migrate in a way that overloads a previously safe cluster.

In other words, you can practice incident trajectory design:

  • Where can you place “gravity wells” of resilience?
  • Where are “transfer orbits” dangerously close to fragile components?
  • Where can a small launch error (minor bug) lead to a catastrophic orbit?

Systems Thinking: It’s About Relationships, Not Parts

The dome forces a systems thinking approach.

Instead of asking:

“Why did service X fail?”

You ask:

“What relationships and feedback loops made this tiny issue grow into a large incident?”

With that lens:

  • Retries aren’t just a setting; they are amplifiers of energy in certain orbital lanes.
  • Queues aren’t just buffers; they are time-delayed gravity wells that can store incident energy and release it later.
  • Feature flags aren’t just toggles; they are sudden changes in mass distribution that can shift traffic or dependencies in non-linear ways.

By watching the orbits of past incidents, you learn:

  • Which relationships create positive feedback loops.
  • Which subsystems act as shock absorbers.
  • Which patterns (e.g., “degraded but not down” states) repeatedly produce runaway orbits.

This is where the dome becomes less a toy and more a thinking tool: you see the system as interacting loops, not isolated boxes.


Dependency and Network Analysis Under the Dome

To make the dome truly useful, it should embody network analysis concepts:

  • Critical nodes (high degree): Services with many incoming/outgoing dependencies are bigger, brighter planets.
  • High-betweenness components: Services that sit on many shortest paths (e.g., auth, routing, payment gateways) become visible “choke points” whose failure orbit would touch many others.
  • Fragile links: Latency-sensitive or low-resilience connections are visualized as thin, brittle orbital paths, easily disrupted by noise.

By running incident scenarios inside the dome, you can:

  • Identify which nodes, if perturbed, generate the most complex orbits.
  • See which paths are repeatedly traversed by different incidents (hot orbital lanes).
  • Discover which “small” components secretly function as critical transit hubs.

Over time, your dome becomes a living incident atlas — a record of where tiny failures tend to travel and where they tend to explode.


Toward a “Theory of Dependencies” for Software

Most engineering teams have a theory of performance, a theory of testing, a theory of deployment.

Few have a real theory of dependencies.

The Incident Story Observatory Dome is a physical metaphor for developing such a theory. You can:

  • Experiment with architectures: What if you add a new dependency? What if you introduce a new caching layer? How does the gravitational field change?
  • Explore different structures: Monoliths vs. microservices vs. modular monoliths as different planetary configurations: a single massive planet vs. many small ones vs. constellations.
  • Compare reliability trade-offs: What happens to failure orbits when you:
    • add an extra shared service,
    • break a service into multiple smaller services,
    • centralize vs. localize a critical capability?

A theory of dependencies is about understanding how the shape of your system governs:

  • The kinds of incidents you get.
  • The routes those incidents tend to follow.
  • How hard they are to detect and contain.

The dome gives you a way to see and communicate that theory — to new team members, leadership, and even yourself.


LLMs and Agents as Mission Control

Where do LLMs and agentic tools fit in? Think of them as mission control for your desk-sized planet.

If the dome is the orbital model, then mission control is:

  • Continuous observation: Ingesting logs, traces, metrics, and events to infer where new failure particles are entering your system.
  • Trajectory prediction: Given a detected anomaly in service A under current load and configuration, estimating the most likely incident orbits:
    • “This will likely impact auth, then user profile, then checkout within 15 minutes.”
  • Suggested maneuvers: Recommending actions as mid-course corrections:
    • lower retry limits,
    • shed load from specific regions,
    • temporarily disable a feature flag,
    • fail over a particular dependency.

With enough historical incident data, an LLM-backed mission control can:

  • Recognize patterns of orbits similar to past incidents.
  • Simulate “what if” scenarios for runbooks.
  • Help you design better gravitational fields (dependencies, limits, mitigations) that make dangerous orbits unlikely.

The dome is the analog front-end; the LLM/agents are the digital brains behind it.


Bringing the Dome Mindset into Your Team

You don’t need an actual glass dome to start thinking this way (though building one would be fun).

To adopt the Observatory Dome mindset:

  1. Retell incidents as orbits

    • In postmortems, describe:
      • Where the failure “launched.”
      • Which services it passed.
      • Which feedback loops amplified it.
      • Where it was finally captured or dissipated.
  2. Map gravitational bodies and transfer orbits

    • Identify:
      • Core data stores (massive planets).
      • Shared infrastructure (busy orbital hubs).
      • Fragile links (thin, breakable connections).
  3. Design for trajectory shaping

    • Add or tune:
      • Circuit breakers and bulkheads as “atmospheric drag.”
      • Rate limits as “escape velocity requirements.”
      • Fallback paths as alternate orbits.
  4. Empower mission control

    • Feed your LLM/incident tools with:
      • dependency graphs,
      • incident histories,
      • runbooks and mitigation options,
      • SLOs and business priorities.
    • Ask them not just “what broke?” but “where is this orbit heading?”

Conclusion: A Better Way to Watch Things Go Wrong

Every system lives under a constant rain of tiny failures: bad inputs, transient timeouts, noisy neighbors, partial deploys. Most never become incidents — their orbits decay quietly, absorbed by the resilience mechanisms you’ve built.

But some failures slip into precarious trajectories, brushing close to just the wrong services, at just the wrong time.

The Incident Story Observatory Dome is a way to make that invisible structure visible. To see incidents not as surprises, but as paths written into the shape of your system. To experiment with dependencies, feedback loops, and mitigations in a way that feels physical and understandable.

And as LLM- and agent-powered mission control matures, you’ll have help not just watching those orbits but steering them — turning your desk-sized planet into a lab for building systems where even when things go wrong, they do so in predictable, containable ways.

In the end, the dome is less a gadget and more a mindset: treat your system as a small solar system of interacting bodies, and your incidents as orbits you can study, predict, and design against — before they crash back to earth.

The Analog Incident Story Observatory Dome: A Desk-Sized Planet for Watching Tiny Failures Orbit Your System | Rain Lag