Rain Lag

The Analog Incident Story Ferris Clock: Rehearsing On‑Call Tradeoffs in Slow Motion

How a desk‑sized paper “Ferris clock” can turn incident response practice into a tangible, slow‑motion rehearsal that sharpens on‑call judgment, tradeoff thinking, and team coordination.

Introduction

Most teams only truly meet their incident response process when something is already on fire.

By then, it’s too late to thoughtfully practice tradeoffs, calibrate communication, or explore "what if" paths. You’re locked into the moment, racing the clock, and optimizing for survival more than learning.

What if you could rehearse incidents in slow motion? Not in a dense doc, or a slide deck, or yet another abstract retro—but with a physical, desk‑sized paper wheel that you and your team literally turn together.

Enter the Analog Incident Story Ferris Clock: a paper wheel you spin by hand to step through an incident timeline. It’s part cybersecurity tabletop exercise, part game, and part decision lab for on‑call teams.


What Is the Incident Story Ferris Clock?

Picture a large circular wheel (think: the size of a dinner plate or bigger) that sits on the table like an analog dashboard. Around its edge are time slices and decision points:

  • “Alert triggers”
  • “First response”
  • “Triage & diagnosis”
  • “Mitigation options”
  • “Comms up / comms out”
  • “Escalation & rollback”
  • “Post‑incident reflection”

You rotate the wheel clockwise, and each segment exposes:

  • A scenario card (context, symptoms, stakes)
  • A set of choices (e.g., "Roll back now" vs. "Gather more signals")
  • A quick view of tradeoffs and evaluation dimensions
  • Prompts about who does what and who gets informed

Instead of racing through an incident in real time, you advance the wheel manually—pausing at each step to talk through decisions, tradeoffs, and consequences.

It’s analog by design: no tabs, no notifications, no dashboards. Just the team, the scenario, and the wheel.


Treat It Like a Cybersecurity Tabletop Exercise

The Ferris clock works best when you run it like a cybersecurity tabletop (TTX):

  1. Set the scene

    • The facilitator introduces today’s incident: maybe a partial outage, a data quality issue, or a suspicious access pattern.
    • Clarify assumptions: time of day, who’s on call, what tooling exists.
  2. Assign roles (even if it’s a small group)

    • Incident commander
    • Primary responder / fixer
    • Comms lead (internal + external)
    • Observer / scribe
  3. Turn the wheel, one segment at a time

    • At each step, the facilitator reads the prompt and options.
    • The team discusses what they’d actually do, not what the doc says they should do.
  4. Focus on paths, not just outcomes

    • TTXs are about how you get to a decision: what info you seek, who you pull in, how you communicate uncertainty.
    • The Ferris clock formalizes that: each segment is really a conversation starter about process.
  5. Capture friction and gaps

    • Where do people disagree?
    • What’s confusing about ownership or next steps?
    • What documentation or automation is clearly missing?

The goal isn’t to "win" the scenario. It’s to make the invisible visible: assumptions, habits, and failure modes that only surface under pressure.


Slow Motion as a Feature, Not a Bug

Real incidents compress time. The Ferris clock deliberately stretches it.

When you slow the story down, you can finally see the tradeoffs that get blurred in the rush:

  • Speed vs. safety
    Do we roll back now with incomplete information, or keep investigating and risk a longer outage?

  • Automation vs. human judgment
    Do we trigger an auto‑remediation play that usually works but might be risky, or have a human confirm the diagnosis first?

  • Short‑term fixes vs. long‑term resilience
    Do we hot‑patch a config and move on, or endure a bit more pain now to build a more resilient path?

The wheel forces the team to stop and narrate:

"If we choose Option A, what are we optimizing for? What risk are we accepting?"

This is where the real learning happens. You aren’t just memorizing a checklist; you’re training judgment.


Add Simple Tradeoff Mantras as Mental Models

Under pressure, people don’t recall long documents; they recall short, sticky phrases.

You can embed these as 3‑word mantras or short tradeoff frames on the wheel itself. A few examples:

  • "Stabilize before optimize" – In early incident phases, focus on stopping the bleeding, not making things elegant.
  • "Logs, then levers" – Observe before changing; collect signals before you pull levers.
  • "Safety over speed" – If human safety, data loss, or legal exposure is in play, err on the side of caution.
  • "Bias towards rollback" – When in doubt about a recent change, reversing it is often safer than inventing a new fix under stress.

You might adapt these from SRE, DevOps, or MLOps practices. For example, in ML incidents:

  • "Integrity before accuracy" – Don’t ship predictions you can’t trust, even if metrics look good.
  • "Explain, then scale" – Understand the failure mode before scaling a mitigation.

Write these mantras along the rim or spokes of the Ferris clock so that, as you turn the wheel, you’re constantly reminded of the default mental models you want responders to lean on.

Over time, these short phrases become automatic anchors when the real pager goes off.


Score Decisions with ML‑Inspired Evaluation Dimensions

To deepen the exercise, borrow ideas from ML benchmarks and apply them to incident decisions. For each decision point, evaluate options across dimensions like:

  • Accuracy – Are we correctly understanding the incident?

    • Did we validate hypotheses with data?
    • Are we distinguishing symptoms from root cause?
  • Robustness – How resilient is this response to variation and uncertainty?

    • If we’re wrong about the cause, does our action make things worse?
    • Will this approach still work if the incident mutates?
  • Bias – What blind spots or defaults are skewing our choice?

    • Are we over‑trusting certain dashboards or metrics?
    • Are we defaulting to "blame the network / DB / ML model" because we always do?
    • Are we ignoring non‑engineering stakeholders’ needs (support, legal, customer success)?
  • Efficiency – How well are we using time, people, and compute?

    • Are we escalating too early or too late?
    • Are we burning three senior engineers on something a runbook could handle?

On the Ferris clock, each segment can show a small scoring grid (1–5) for these dimensions. After a choice is made, the group:

  1. Scores the decision together.
  2. Reflects: What would we change to move one point higher on robustness or bias awareness?

This transforms the exercise from story‑time into structured skill building.


Combining AI and Analog

AI tools pair surprisingly well with an analog wheel.

Use AI for the generative, variable, and heavy‑lift parts:

  • Crafting realistic incident narratives (infra, application, data, ML, security).
  • Varying parameters: traffic patterns, user impact, regulatory constraints.
  • Generating metrics snapshots, log snippets, or alert payloads.
  • Suggesting decision options with pros/cons.

But keep the Ferris clock itself physical to:

  • Reduce distraction (no switching tabs mid‑exercise).
  • Encourage face‑to‑face discussion instead of silent Slack threads.
  • Create a shared visual artifact everyone can point to and modify with sticky notes.
  • Make the ritual feel different from day‑to‑day work.

You might:

  • Use AI beforehand to create a set of laminated scenario cards and decision option cards.
  • Print evaluation grids and mantras, then glue them onto the wheel.
  • After each session, feed notes back into an AI assistant to propose playbook updates and new scenarios that target weak spots.

The result is a tight loop: AI helps you design richer exercises; the analog clock helps your team stay present, coordinated, and reflective.


Making the Ferris Clock a Team Ritual

The Ferris clock only pays off if it’s used regularly, not as a one‑off workshop prop.

Consider making it a recurring ritual:

  • Cadence: 45–60 minutes, every 2–4 weeks.
  • Participants: On‑call engineers, SREs, on‑call managers, plus rotating guests from support or product.
  • Format:
    1. Pick a scenario (or let someone "spin the random scenario" stack).
    2. Assign roles.
    3. Turn the wheel through the incident.
    4. Score key decisions across accuracy, robustness, bias, efficiency.
    5. End with 2–3 concrete improvements: runbook changes, automation candidates, comms templates.

Track what changes over time:

  • Are decisions getting more consistent with your mantras?
  • Are fewer points of confusion coming up about ownership and escalation?
  • Are people referencing Ferris clock scenarios in real incidents ("This feels like Scenario 3—let’s try that rollback strategy")?

When a real incident hits, responders will have muscle memory not just for commands and tools, but for:

  • How to structure their thinking.
  • How to explain tradeoffs out loud.
  • How to coordinate across roles.

That’s the real win.


Conclusion

The Incident Story Ferris Clock is intentionally low‑tech: paper, ink, and a bit of imagination. Yet it tackles a deeply modern problem—how to prepare teams for complex, high‑stakes, always‑on systems—by slowing everything down.

By treating it like a tabletop exercise, surfacing tradeoffs, grounding decisions in simple mantras, scoring them along ML‑inspired dimensions, and pairing AI‑generated content with an analog ritual, you give your team something they rarely have during real incidents:

  • Time to think.
  • Space to disagree.
  • A shared language for tradeoffs.

You can’t prevent every incident. But you can rehearse better.

Start with a blank circle of cardboard, a marker, and one scenario. Turn the wheel together. The next time the pager goes off, your team won’t just be reacting—they’ll be remembering how they practiced.

The Analog Incident Story Ferris Clock: Rehearsing On‑Call Tradeoffs in Slow Motion | Rain Lag