Rain Lag

The Analog Incident Story Ferris Wheel: Rotating Through Past Outages One Car at a Time

How to turn past outages into a rotating Ferris wheel of stories, practice, and continuous learning—using analog horror vibes, structured retrospectives, and role‑play to build a more resilient engineering culture.

The Analog Incident Story Ferris Wheel: Rotating Through Past Outages One Car at a Time

Incidents never really disappear.

They leave behind artifacts: grainy logs, broken dashboards, half-remembered Slack threads, and hastily written post-mortems. Most teams file these away and move on, hoping the same failure won’t return.

A better approach is to deliberately bring those incidents back—not as random ghosts, but as passengers on a Ferris wheel that you control.

In this model, each car on the Ferris wheel is a specific past incident. You rotate through them in a controlled, repeatable loop, revisiting each story with fresh eyes, more context, and better tools. Over time, this “analog incident Ferris wheel” becomes a powerful engine for resilience, training, and continuous improvement.

This post explores how to:

  • Use the Ferris wheel metaphor to organize and re-use past incidents
  • Frame retrospectives with an analog horror narrative to reveal hidden risks
  • Run data-driven incident reviews that actually change behavior
  • Practice with “Wheel of Misfortune” role-play exercises
  • Balance automated incident management (AIM) with human-led reflection
  • Turn outages into memorable stories instead of dry reports
  • Build a continuous rotation of learning, updates, and improvements

The Ferris Wheel of Incidents: A Different Mental Model

Imagine your incident history as a Ferris wheel.

  • Each car is a past outage: a specific, contained event.
  • The wheel is your review schedule and process.
  • The rotation is your continuous improvement cadence.

At any given time, one car is at the top—your current focus incident. But the others are still there, waiting for their next ride to the top.

Instead of:

  • Writing a post-mortem once,
  • Sharing it around,
  • And never looking at it again,

…you schedule rotations:

  • Every 2–4 weeks, a different past incident comes to the top.
  • The team revisits it with:
    • New data
    • Updated systems
    • Fresh team members
    • New tools and perspectives

You don’t rely on memory. You systematically re-board each car and look out from its vantage point again.

This keeps old lessons from fading and turns your incident history into an active training and design asset, not an archive.


Analog Horror as a Retrospective Lens

Think of an incident retrospective as a piece of analog horror:

  • You’re watching unsettling “found footage”: logs, metric graphs, on-call pages, pager escalations, and Slack messages.
  • The initial story looks like chaos, but as you rewind and rewatch, patterns emerge.
  • The horror isn’t jump scares; it’s the slow realization of systemic risks you’d been ignoring.

This framing is useful because it:

  1. Centers the artifacts
    Logs, graphs, alerts, dashboards, customer tickets, and traces aren’t just byproducts—they’re the film. You walk through them like a scene investigation.

  2. Encourages forensic thinking
    “When did this weird metric start drifting?” “Why did no one see this alert?” “What did we misinterpret in the moment?”

  3. Highlights ambient dread
    The goal isn’t to scare the team, but to surface slow-burn risks:

    • Overly complex dependencies
    • Fragile runbooks
    • Single points of failure in people or systems

Treat each retrospective as a story reconstruction from eerie, incomplete evidence. That mindset naturally pushes you toward better observability, clearer runbooks, and cleaner architectures.


Data-Driven Retrospectives: Structure Over Vibes

The horror aesthetic is helpful, but the substance has to be data-driven. A good incident retrospective has three stages:

1. Prepare Thoroughly

Before the meeting:

  • Collect artifacts:
    • Logs and traces
    • Metrics and graphs (with time windows around the incident)
    • Alert timelines
    • Incident channel transcripts
    • Customer impact timelines
  • Build a neutral timeline:
    • What happened, when, and who did what
    • Avoid blame; just sequence events
  • Identify guiding questions:
    • Where did detection lag?
    • Where did communication fail?
    • Where did tooling or runbooks help or hurt?

2. Facilitate With Structure

In the session, use a consistent agenda:

  1. Context & stakes – What was at risk? Revenue, trust, safety, SLOs?
  2. Timeline walkthrough – Walk minute-by-minute with artifacts on screen.
  3. Detection analysis – When could we have known versus when we did know?
  4. Decision points – Where did we choose a path that later constrained us?
  5. Systemic factors – Org silos, unclear ownership, missing runbooks, fragile designs.
  6. What worked well – Don’t skip this; it’s crucial for confidence.
  7. Actions – Concrete, testable follow-ups.

3. Follow Up With Owners and Timelines

Retrospectives without follow-through are just storytelling.

  • Convert insights into tickets with owners, priorities, and deadlines.
  • Tag them to the incident ID so future rotations can see what changed.
  • Revisit those actions the next time that car comes around the wheel:
    • Did we update the runbook?
    • Did the new alert actually fire earlier in subsequent smaller incidents?
    • Did we fix that flaky dependency or just add a bandaid?

The Wheel of Misfortune: Role-Playing the Past

One of the most effective ways to breathe life into your Ferris wheel is the Wheel of Misfortune: a role-play exercise where your team re-enacts real past outages.

How it works:

  1. Pick a past incident (one car on the wheel).
  2. Prepare the starting state:
    • Simulated alerts
    • Partial dashboards
    • A stubbed incident channel
  3. Assign roles:
    • Incident commander
    • Communications lead
    • Subject-matter experts (database, networking, SRE, product, etc.)
  4. Run the simulation in real time:
    • Reveal logs, symptoms, and misleading clues gradually.
    • Let the team debug, decide, and communicate.
    • Occasionally “inject” complications (e.g., conflicting signals, stakeholder pings).

Benefits:

  • New engineers build intuition for how incidents feel and flow.
  • The team practices roles, communication, and decision-making under pressure.
  • You see where runbooks are confusing or incomplete—in a safe environment.
  • You normalize talking about failure instead of hiding it.

Over time, your Ferris wheel becomes not just a library of stories, but a training ground your team regularly rides.


Automation vs. Reflection: AIM Isn’t the Whole Story

Modern Automated Incident Management (AIM) is powerful:

  • Auto-detection via anomaly detection and SLOs
  • Auto-routing of alerts and incident creation
  • Automated remediation runbooks and rollbacks
  • Bot-driven status updates and communication templates

These are essential for speed and reliability. But they cannot replace human-led reflection.

Automation excels at:

  • Detecting faster
  • Mitigating quicker
  • Reducing toil

Humans excel at:

  • Interpreting ambiguous patterns across multiple incidents
  • Weighing trade-offs: speed vs. safety, cost vs. resilience
  • Designing simpler systems and clearer processes
  • Creating stories that stick in people’s heads

Your Ferris wheel is where humans do the work that automation can’t:

  • Not “How do we page faster?” but “Why does this fragile dependency exist at all?”
  • Not “How can we add another alert?” but “Why is our mental model of this system so wrong?”

Automation keeps the ride safe and smooth. Human reflection decides whether the Ferris wheel should even be built that way.


Turning Outages Into Stories People Remember

Most post-mortems read like tax documents.

Instead, treat each outage as a story with:

  • Characters – on-call engineer, incident commander, customers, third-party providers
  • Stakes – what could be lost: money, trust, data, reputation
  • Inciting incident – the first weird alert, strange error, or customer ticket
  • Rising tension – graphs spiking, Slack channels filling, executives joining
  • Turning point – the “aha” moment where someone sees the real cause
  • Resolution – mitigation, recovery, and immediate follow-ups
  • Aftermath – what changed in the system and in your understanding

Telling incidents as stories makes them:

  • Easier to remember
  • Easier to teach to new team members
  • Easier to connect across seemingly unrelated outages (“This feels like that DNS issue we had last year…”)

When that incident car comes around again, people recall not only the metrics, but the narrative arc—which is precisely what you want in a crisis.


Continuous Improvement: Keeping the Wheel in Motion

A Ferris wheel that never moves is just sculpture. The value comes from regular rotation.

Operationalize the rotation:

  • Maintain an incident backlog: a prioritized list of past incidents (cars) to revisit.
  • Set a cadence: e.g., one retrospective refresh or Wheel of Misfortune every 2–4 weeks.
  • Track what changes each rotation:
    • Runbook updates
    • Tooling improvements
    • Architecture changes
    • New alerts or SLOs
    • Training improvements
  • Periodically retire cars:
    • When the architecture or product has changed so much that the incident is no longer relevant.
    • Celebrate: that risk class has been genuinely addressed.

This turns “learning from incidents” from an aspiration into a rhythmic practice.


Conclusion: Build Your Own Analog Incident Ferris Wheel

Incidents are inevitable. Wasting them is optional.

By treating each outage as a story, organizing them into a Ferris wheel of revisitable cars, and leaning into both analog horror-style forensics and structured, data-driven analysis, you can turn your incident history into a durable competitive advantage.

Add to that the Wheel of Misfortune for practice and the right balance between AIM-driven speed and human-driven insight, and you get more than reliability—you get a culture that expects to learn, not just to fix.

Don’t let your old incidents fade into the archive.

Put them on the wheel.

Rotate them.

Ride them again—on your terms—and come back down each time with a slightly safer, smarter, more resilient system than before.

The Analog Incident Story Ferris Wheel: Rotating Through Past Outages One Car at a Time | Rain Lag