The Analog Incident Story Compass Clock: Pointing to Your Next Outage Before It Happens
How an imaginary desk-sized dial reveals what’s missing in today’s incident management tools—and why we need systems that actually point to the next outage, not just record the last one.
The Analog Incident Story Compass Clock: A Desk-Sized Dial That Points to Where Your Next Outage Will Begin
Imagine walking into your operations war room and seeing a heavy, brass desk clock in the center of the table.
Instead of hands that show hours and minutes, it has a single, beautifully machined needle. Around the circumference are labels: Database, Auth Service, Payment Gateway, Network Edge, CI/CD, Third-Party APIs, and more.
Every few minutes, the needle twitches, then settles—pointing to the subsystem most likely to cause your next outage.
That’s the Analog Incident Story Compass Clock: a fictional instrument that, by its impossibility, exposes a very real problem in how we build and use incident management tools today.
We’ve gotten very good at administering incidents. We’re far worse at anticipating them.
This post uses that imaginary desk-sized dial as a metaphor for what modern incident systems are missing—and how ideas from emergency services, dynamic interfaces, and high-stakes reliability engineering can help us build tools that actually point toward the next failure.
The State of Incident Management: Great at Logistics, Weak at Insight
Most incident management platforms today are designed around operations and administration:
- Create an incident
- Assign responders
- Spin up a Zoom or bridge call
- Track status and timelines
- Log updates for postmortems and compliance
These are essential. But nearly all of them share the same blind spot: they describe the story of the outage already happening, not the story that’s about to unfold.
The Logistics–Insight Gap
Between “Who’s on call and what’s the current status?” and “Where is this going to break next?” lies a major gap:
- Tools excel at workflow orchestration (paging, ticketing, communication).
- Fewer tools help teams understand root causes in real time.
- Almost none help forecast where the next outage might begin.
Your dashboards show latency, error rates, and CPU usage. Your incident tool shows who’s working on which problem. But what connects them into a narrative? Where’s the compass that says:
“If this trend continues, the next failure is most likely to start in your auth service”
The Story Compass Clock is absurd precisely because we don’t have systems that synthesize data into a directional prediction. We have fragments; we lack a story engine.
What High-Stakes Domains Get Right
To see what’s missing, it’s helpful to look outside typical software operations.
1. Emergency Services: Dynamic, Mission-Critical Interfaces
Modern emergency response teams don’t flip between static maps, paper binders, and radios. They use dynamic interfaces that surface the right information at the right time:
- Navigation to the incident site
- Pre-plans for critical buildings (floor plans, hydrant locations, hazardous materials)
- Custom map layers showing things like restricted zones, prior calls, or key utilities
These interfaces aren’t generic dashboards. They are mission-specific views that integrate location, resources, and context into one operational picture.
This is the analog of our Story Compass Clock: when something happens, responders can immediately see where, what, and how—not just that something is wrong.
2. Mobile Data Terminals: Purpose-Built, Integrated Tools
Mobile Data Terminals (MDTs) in police cars, fire engines, and ambulances show how powerful purpose-built tools can be when they are fully integrated into daily workflows:
- Tailored interfaces for dispatch, status updates, navigation, and reporting
- Direct integration with CAD (Computer-Aided Dispatch) systems
- Automatic logging of locations, times, and actions
In incident response terms, MDTs are the opposite of “yet another tab in your browser.” They are deeply embedded into how responders work, so the technology disappears and the mission takes center stage.
3. LNG Carriers and Reliability Engineering: Predictive Thinking
Reliquefaction systems on liquefied natural gas (LNG) carriers operate in a context where failure can be catastrophic. Reliability assessment isn’t an afterthought; it’s fundamental.
These systems are designed and operated with predictive, reliability-focused thinking:
- Failure modes are systematically modeled.
- Maintenance is planned based on likelihood and impact, not just time.
- Operators understand where systems are most fragile and where the next issue is most likely to occur.
In other words, they have a mental and analytical compass for failure. Not a perfect predictor, but a disciplined, structured way to reason about what breaks next.
What an Incident Story Compass Would Actually Do
Let’s bring the metaphor closer to reality. If you could build a “Story Compass” for outages—digital, not brass—what would it need to do?
1. Aggregate All Relevant Data in One Place
Specialized incident response software for first responders works because it:
- Pulls in all relevant data (maps, building plans, historical incidents, resource statuses)
- Presents information in a single, coordinated view
In software operations, a Story Compass would:
- Ingest metrics, logs, traces, deployment history, feature flag changes, and config diffs
- Incorporate incident history and known hotspots
- Connect dependencies: which services talk to which, and how failures propagate
Instead of flipping between twelve tools, responders would see one integrated picture of system health.
2. Surface Dynamic, Mission-Critical Context at the Right Moment
Not all data is useful all the time. The key is context-sensitive surfacing:
- During an auth failure, emphasize user login flows, token services, and upstream identity providers.
- During a payment error spike, highlight gateway dependencies, fraud checks, and recent pricing or tax configuration changes.
Like custom map layers for first responders, the interface should adapt to the incident type in real time.
3. Tell a Probable Story, Not Just a State Snapshot
The magic of our imaginary clock isn’t that it shows what’s broken now; it suggests where the story is heading.
A real system could:
- Use historical incidents to learn common failure pathways (e.g., “cache eviction → DB load spike → timeouts in user-facing services”).
- Correlate current signals with those pathways.
- Present a ranked list of likely next failure points, not as oracles, but as hypotheses:
- “60% of past incidents with this pattern escalated to the payments service.”
- “If DB CPU remains above 90% for 10 minutes, expect increased latency for checkout.”
This is predictive in the same sense as LNG reliability analysis: structured probabilistic thinking, not magic.
4. Integrate into Day-to-Day Workflows
A compass you only touch during a storm is a toy, not a tool.
To be valuable, a Story Compass should:
- Be visible in normal operations (e.g., during deployments, load tests, or routine health checks)
- Influence how you plan capacity, maintenance, and architecture changes
- Fit seamlessly into existing channels (Slack, Teams, on-call apps), the way MDTs are baked into emergency workflows
When prediction and reliability thinking are built into everyday practice, your team’s intuition sharpens—and your tools become more accurate over time.
From Fantasy Clock to Practical Roadmap
You probably won’t put a literal brass dial on your SRE’s desk. But you can start building your own Story Compass in practical steps.
- Map your dependency graph. Know which systems feed which others, and how failures propagate.
- Centralize incident-relevant data. Even a basic cross-linking of metrics, logs, incidents, and deployments into a unified view is a leap forward.
- Identify your chronic hotspots. Use incident history to find the usual suspects—this is your first rough “dial.”
- Add context-aware views. For each major incident category (network, auth, data, third-party), define which signals matter most and create tailored dashboards or runbooks.
- Experiment with simple prediction. Start small: “When X rises and Y falls together, Z usually breaks next.” Encode those patterns as alerts, suggestions, or annotations.
Each step makes the absurd fantasy of a Story Compass Clock a little less absurd.
Conclusion: Pointing Beyond the Pager
The Analog Incident Story Compass Clock is a story about what’s missing, not a blueprint for a gadget.
Most of our incident tools optimize for coordination and record-keeping, not understanding and foresight. High-stakes domains—emergency services, LNG carriers, and others—show that it’s both possible and necessary to think in terms of dynamic context, integrated data, and predictive reliability.
If your incident platform only tells you what’s broken and who’s on call, it’s a logbook, not a compass.
The next generation of incident systems will do more. They’ll aggregate all relevant signals, adapt to the situation, and help teams reason about where the story of failure is likely to go next.
You don’t need a brass dial to start.
You need tools—and practices—that turn your incidents from disconnected events into a coherent narrative, with a visible direction of travel. That’s when you stop merely reacting to outages and begin to navigate around them.