Rain Lag

The Paper Incident Story Switchboard: Hand‑Wiring Outage Calls Before Signals Cross

How to design realistic, high‑value incident response tabletop exercises that surface communication gaps, test your outage playbooks, and prepare your team for real‑world crises.

The Paper Incident Story Switchboard: Hand‑Wiring Outage Calls Before Signals Cross

When an outage hits, your systems aren’t the only things under load—your people and your communication channels are, too. Status pages light up, execs want answers, customers are angry, and Slack feels like a firehose. By the time signals start crossing, it’s often too late to untangle them in real time.

That’s where the “paper incident story switchboard” comes in: a deliberately low‑tech, high‑fidelity way to hand‑wire an incident before it happens. Using realistic tabletop exercises, you walk through outages and attacks on paper (and in conversation) so that when the real calls start coming in, your team already knows which wires to connect.

This post breaks down how to build those exercises so they match your real threat landscape, test both your technical and communication muscles, and turn lessons into continuous improvement.


Why Tabletop Exercises Need to Be Uncomfortably Real

A lot of tabletop exercises fail because they feel like “compliance theater”: generic, abstract scenarios that never touch the sharp edges of your actual risk.

To be useful, your exercises must be tailored to your threat landscape, not pulled from a template. That means:

  • Ransomware for orgs with large data stores, legacy systems, or critical business‑to‑business integrations.
  • DDoS attacks for customer‑facing platforms where availability is the top SLO.
  • Phishing and business email compromise (BEC) for orgs with high‑value financial workflows or distributed approvals.
  • Insider threats for environments with privileged access, sensitive IP, or complex vendor ecosystems.

Ask pointed questions:

  • What’s the single worst‑day scenario for our revenue?
  • What would most damage our brand or trust?
  • Which systems, if down, put lives or safety at risk?

Your paper incident story switchboard should route through these scenarios—not the textbook ones. The more the scenario makes people say, “This actually could happen here,” the more seriously they will engage.


Two Styles of Tabletop: Discussion vs. Operational

Think of tabletop exercises on a spectrum from whiteboard storytelling to mini‑drills.

1. Discussion‑Based Exercises

These are primarily conversational:

  • You present the scenario in stages ("injects").
  • Participants talk through what they would do.
  • The facilitator tracks decisions, questions, and gaps.

Use this style to:

  • Clarify roles and ownership.
  • Explore “what if” branches and edge cases.
  • Involve executives and non‑technical stakeholders.

2. Operational (Hands‑On) Exercises

These add real tools and systems into the mix:

  • Participants look at dashboards, run commands (in a safe or simulated environment), and use real communication channels.
  • Decisions are time‑boxed to simulate pressure.

Use this style to:

  • Practice using monitoring, tracing, and runbooks under stress.
  • Surface friction in tooling: missing dashboards, unclear alerts, noisy logs.
  • Validate that the incident response plan is actually executable.

Both formats are valuable. Discussion uncovers assumptions; operational exposes practical friction. Your switchboard should alternate between them over time so people can both reason and execute under pressure.


Start with a Clear Purpose, Not a Clever Story

It’s tempting to design Hollywood‑level drama into your scenario. But before you invent plot twists, write a plain, specific purpose statement:

“This exercise is intended to test our ability to triage a suspected ransomware incident, coordinate internal and external communications, and make a go/no‑go decision on paying ransom within two hours.”

Or:

“This exercise is intended to test how our SRE, support, and comms teams handle a sudden latency spike in our payment API and provide accurate customer updates within 30 minutes.”

From that purpose, build a planning checklist like:

  • What systems, teams, and time zones must be involved?
  • Which playbooks or runbooks should this exercise touch?
  • What decisions must be made during the scenario?
  • Which metrics, logs, or traces should participants see?
  • Who is observing and who is facilitating?
  • How will we capture outcomes and follow‑ups?

The scenario is the story; the purpose and checklist are the wiring diagram behind the switchboard. They keep the conversation aligned with what you’re actually trying to test.


Designing the Paper Story: From First Alert to Postmortem

A strong tabletop scenario unfolds over time, like a call that keeps getting transferred. Build it as a sequence of injects:

  1. Initial Signal

    • Example: “Checkout latency has doubled on the EU cluster; error rate is creeping up.”
    • Prompt: Who notices? Who’s on point? What’s your first move?
  2. Escalating Symptoms

    • Example: “Support tickets spike. Social media reports failed payments.”
    • Prompt: When do you declare an incident? Who is paged? How is severity set?
  3. Conflicting Data

    • Example: “APM shows normal CPU; distributed tracing shows timeouts between two microservices.”
    • Prompt: How do you reconcile: is this network, database, or code? Who owns each part?
  4. Executive and Customer Pressure

    • Example: “Sales escalates a major customer threat to churn; a VP jumps into the Slack channel.”
    • Prompt: Who talks to leadership? How do you prevent decision thrash?
  5. Decision Point

    • Example: “Rollback will cause 30 minutes more downtime; pushing a patch is risky but might fix it immediately.”
    • Prompt: Who decides? Based on what information and which SLOs?
  6. Stabilization & Recovery

    • Example: “Error rates return to baseline, but a subset of users may have inconsistent data.”
    • Prompt: How do you verify recovery? How do you handle data correction and communication?
  7. Post‑Incident Review Setup

    • Prompt: What artifacts do you save? Who must attend the review? How soon do you hold it?

By the end, you should have traced the full life cycle of the outage story—from the first hint of trouble to the retrospective invite.


The Backbone: A Living Incident Response Plan

Tabletop exercises are only as good as the plan they’re testing. A useful incident response plan covers at least three pillars:

  1. Triage

    • How severity is defined (SEV‑1 vs SEV‑3).
    • How incidents are declared and who can declare them.
    • Initial containment steps and safety checks.
  2. Communication Protocols

    • Internal: which channels, who leads them, and how updates are structured.
    • External: status page, customer emails, public statements.
    • Executive communication: what gets escalated and how often.
  3. Escalation Paths

    • On‑call rotations and backup roles.
    • Legal, compliance, and PR escalation (especially for security incidents).
    • Vendor and partner escalation (cloud providers, payment processors, etc.).

Your paper switchboard exercise should stress these pathways on purpose. When something jams—unclear ownership, missing contacts, or conflicting instructions—you’ve found wiring to fix.

And you’re never done. Every exercise and real incident should feed back into the plan:

  • Update runbooks and checklists.
  • Clarify who owns which systems or decisions.
  • Improve training and documentation for new responders.

Don’t Forget the Signals: Distributed Tracing and the Four Golden Signals

Modern outages aren’t simple single‑server failures; they’re often failures in complex distributed systems. Your switchboard scenarios should reflect that.

Two tools are critical:

Distributed Tracing

Distributed tracing lets you follow a single request as it hops across services. In a scenario, traces can:

  • Reveal where latency is introduced (e.g., a slow downstream dependency).
  • Show fan‑out and retries that amplify load.
  • Help differentiate between code, network, and data store issues.

The Four Golden Signals

Design your injects and dashboards around these four signals:

  1. Latency – How long requests take.
  2. Traffic – How much demand is hitting the system.
  3. Errors – How often requests fail.
  4. Saturation – How “full” your resources are (CPU, memory, connections).

In an operational tabletop, give participants realistic but incomplete metrics and traces. The goal isn’t to make them guess blindly; it’s to help them practice forming and testing hypotheses under pressure.


Communication Failures Hurt More Than Technical Ones

History is full of major crises where poor communication made everything worse:

  • Three Mile Island – Confusing signals and miscommunication delayed understanding the severity of the nuclear accident.
  • Deepwater Horizon – Fragmented accountability and misaligned risk communication contributed to catastrophic decisions.
  • United Airlines’ passenger removal incident – Poor internal and external messaging turned a bad situation into a global PR disaster.
  • Dreamworld (Australia) – Inadequate and chaotic communication in the aftermath of a fatal ride incident deepened public distrust.

Your organization may not run nuclear plants or theme parks, but the lesson is the same: crisis communication is as important as technical response.

Your paper incident story switchboard should actively test:

  • How information flows between technical responders, leadership, PR, legal, and support.
  • How quickly and clearly you can explain risk, not just root cause.
  • How you avoid contradictions between what you tell customers and what you know internally.

If comms is left out of your tabletop exercises, you’re only testing half your response.


After the Exercise: Turn Stories into Systemic Change

The exercise ends when people stand up, but the value is created after the exercise:

  1. Structured Debrief

    • What worked well?
    • Where did we hesitate or get stuck?
    • What decisions were unclear, and why?
  2. Concrete Action Items

    • Assign owners and deadlines.
    • Prioritize by impact on safety, customer trust, or outage duration.
  3. Share the Story Widely

    • Summarize scenario, decisions, and improvements.
    • Use it as training material for new team members.

The switchboard metaphor applies here too: the more you rehearse and refine the wiring, the less likely signals are to cross when it really matters.


Conclusion: Practice Before the Lines Light Up

You can’t predict the exact shape of your next outage or security incident, but you can practice the muscles you’ll need: clear roles, fast triage, meaningful signals, and disciplined communication.

By building a paper incident story switchboard—realistic, purpose‑driven tabletop exercises that combine technical diagnosis with communication under pressure—you hand‑wire the critical connections before the crisis.

Then, when the dashboards turn red, phones ring, and Slack explodes, your team won’t be improvising wiring diagrams. They’ll be following patterns they’ve already rehearsed—leaving more attention for what actually matters: protecting your customers, your people, and your business.

The Paper Incident Story Switchboard: Hand‑Wiring Outage Calls Before Signals Cross | Rain Lag