The Analog Incident Story Tidepool: A Desk-Sized Shoreline for Watching Tiny Failures Turn Into Outage Waves
How analogical thinking, tidepools, and blameless postmortems help SREs see tiny failures as early waves of future outages—before they crash into production.
The Analog Tidepool on Your Desk
Imagine a tiny shoreline sitting on your desk.
It’s not sand and seawater, but dashboards, logs, alerts, incident tickets, and Slack threads. Little disturbances wash in: a spike in latency here, a flaky test there, an error budget burndown that looks a bit steeper than usual.
Individually, they seem small—more like ripples than waves.
But viewed together, over time, they form something richer: a tidepool of incident stories. And if you learn to look at that tidepool the right way, you can see how those tiny failures, precursors, and near misses grow into full-blown outage waves.
This is the idea behind treating your incident data and operational history as an analog tidepool—a living, evolving environment where patterns, not just events, are the main characters.
In this post, we’ll explore why analogies are critical for understanding complex systems, how decades of research on analogical reasoning apply directly to SRE work, and how to build a culture and practice that turns scattered incidents into a coherent, predictive shoreline.
Why Analogies Belong in Your Incident Reviews
Analogies are not just cute metaphors for conference talks. Cognitive scientist Keith Holyoak has spent decades showing that analogical reasoning is a core mechanism of human intelligence and creativity.
We use analogies to:
- Grasp unfamiliar domains by relating them to familiar ones
- See deep relational similarities between things that look different
- Transfer solutions from one context to another
Holyoak’s work, across psychology, neuroscience, AI, and even poetry, shows that analogy is not a side effect of intelligence—it’s a central engine. When you say, “This outage feels a lot like that cache meltdown from last quarter,” you’re doing high-value cognitive work, not hand-waving.
In complex systems, surface details change constantly: different services, different data centers, new code paths, new on-call rotations. But the relational structure of failures—how causes interact, how signals appear, how decisions shape outcomes—often recurs.
That’s where analogies shine.
Analogies let SRE teams say not just what happened, but what this incident is like.
And that shift, from isolated event to member of a pattern, is what turns chaos into learning.
From Unique Outages to Familiar Patterns
A common anti-pattern in incident response is treating every outage as a one-off freak event:
- “That was a really weird corner case.”
- “This will never happen again.”
- “Totally different from what we’ve seen before.”
Sometimes that’s true—but not as often as it feels in the moment.
When SRE teams learn to view incidents analogically, they move from:
- “What exactly broke this time?” to
- “What family of failures does this belong to?”
For example:
- A failed database failover
- A misconfigured feature flag rollout
- A bad cache invalidation strategy
These may look unrelated. But analogically, they might all be instances of "unsafe reversibility assumptions"—places where you assumed you could roll back easily, but reality disagreed.
Once you name and recognize that pattern, you’re no longer just fixing incidents—you’re improving the system against a whole class of future failures.
Your Incident Tidepool: Watching Waves Form
Think of your system as a shoreline and your operational history as a tidepool:
- Every incident is a visible wave crashing onto the rocks.
- Every near miss is a wave that almost crested but broke early.
- Every warning sign is a subtle change in the water: currents shifting, foam patterns forming.
In safety science, concepts like:
- Accident precursors – small issues that share structure with much larger failures
- Accident pathogens – latent conditions in the system that quietly set up future accidents
- Near misses – incidents that were caught or self-resolved before causing major impact
- Warning signs – early signals that something is out of normal bounds
…all describe the ecosystem of small events that surround your big outages.
If you never look at these “tiny creatures” in your tidepool, you only ever see the large waves when they hit. If you do observe them, you can start to see:
- Chains of events forming
- Dependencies tightening
- Stress accumulating
The goal is not to eliminate every tiny failure (that’s impossible). The goal is to see how tiny failures organize themselves into outage waves over time.
The Accident Pathway: Outages as Chains, Not Lightning Strikes
Most outages don’t start with a single catastrophic failure point. They follow an accident pathway—a chain of contributing events, decisions, and conditions that slowly converge.
An accident pathway might look like:
- A configuration default that made sense three years ago remains untouched.
- A new service is built assuming that default is safe.
- An SLO is defined without fully understanding the dependency.
- A traffic spike hits, exposing the latent weakness.
- An attempted mitigation interacts badly with another dependency.
- The combined effects cascade into a major incident.
If you only look at step 6, you’ll blame “the last thing that broke.” But the pathway tells a more accurate story: this outage was years in the making.
Treating your system like a tidepool means:
- You don’t just log the crash—you collect and connect the earlier steps.
- You ask, “Where have we seen this pathway before?”
- You look for analogous pathways in prior incidents, even if the technology stack or services differ.
This is where analogical reasoning turns into practical foresight.
Blameless Postmortems as Tidepool Fieldwork
Blameless, structured postmortems are your fieldwork sessions at the tidepool.
Instead of asking, “Who messed up?”, you ask:
- What patterns are visible here?
- What earlier signals did we have?
- What precursors or pathogens were already present?
- What previous incidents does this resemble?
A strong postmortem culture:
- Normalizes human error instead of criminalizing it
- Focuses on system design, incentives, and information flows
- Captures rich narratives, not just timelines
- Encourages engineers to say, “You know, this reminds me of…”
That last part is crucial. It’s where Holyoak’s decades of research meet real-world SRE practice. When people are safe to speak openly, they will naturally use analogies and stories to make sense of complex events. Your job is to capture and organize those analogies, not filter them out.
Building Your Analog Incident Story Tidepool
You don’t need a new tool category to build an analog tidepool. You need habits and structures that let analogies and patterns surface.
Consider practices like:
1. Tag Incidents by Pattern, Not Just Component
Beyond “database” or “network,” add tags like:
unsafe_reversibilitysilent_degradationunverified_assumptionorphaned_dependency
Over time, these relational tags let you see waves forming across different services.
2. Treat Near Misses as First-Class Citizens
Create lightweight post-incident notes for:
- Rollbacks that almost didn’t work
- Alerts that were noisier than they should be
- Manual interventions that “saved the day”
Capture: What larger failure would this have resembled if it had crossed the line?
3. Run Analog-Driven Review Sessions
On a regular cadence (monthly or quarterly):
- Bring a small set of incidents and near misses
- Ask explicitly: “What is this like?”
- Group stories into families of failure
Treat it like staring into a tidepool and naming the species.
4. Turn Stories into Reusable Heuristics
From patterns, derive rules of thumb like:
- “Whenever reversibility is critical, we must test rollback under load.”
- “Any new dependency needs a documented failure mode analysis.”
These heuristics are the bridges between past stories and future design choices.
The Quiet Power of Watching Tiny Failures
Treating your system as an analog tidepool is not about drama. It’s about quiet, sustained observation.
You:
- Learn to see recurring relational structures in different incidents
- Invite analogies instead of insisting that “this time is totally different”
- Use blameless postmortems to map accident pathways instead of hunt for culprits
- Pay deliberate attention to precursors, pathogens, near misses, and warning signs
Over time, the reward is subtle but profound:
You start to feel waves before they form.
Incidents stop being random storms and become recognizable weather patterns. You recognize the smell of an unsafe assumption. You hear the rhythm of a dependency strain. You see the familiar shape of a pathway that, last year, led to a painful outage—and you adjust course earlier this time.
That’s the value of the analog incident story tidepool. It’s not a dashboard, a metric, or a runbook. It’s a way of seeing.
And once you start seeing your tiny failures as the early waves of future outages, you can do what shorelines have always done best: shape, soften, and redirect the energy before it crashes into something that matters.
Sit down at your desk. Look at your alerts, your tickets, your postmortems.
This isn’t just operational noise. It’s your tidepool.
Start watching the water.