The Paper Switchboard: Orchestrating Modern Incident Command With String, Clips, and a Wall of Handwritten Signals
How a low-tech ‘paper switchboard’ metaphor can transform modern incident management—connecting on-call tooling, clear roles, communication, and dependency graphs into a coherent command system.
The Paper Switchboard: Orchestrating Modern Incident Command With String, Clips, and a Wall of Handwritten Signals
Imagine walking into an incident command room and seeing a giant wall covered in index cards, names, colored string, and binder clips. Each string shows who’s on call. Cards mark systems, owners, and status. Clips pin active incidents to the services they affect. Timelines and updates get scribbled on sticky notes in the margins.
It looks almost absurdly low-tech—a paper switchboard in the age of cloud-native everything. But that mental model is exactly what many modern incident management platforms are trying to recreate: a living, shared map of who’s involved, what’s broken, how things depend on each other, and where communication needs to flow.
Behind today’s polished dashboards and automated alerts, the most effective incident response is still about the same fundamentals: coordination, clarity, and communication. The trick is orchestrating them in ways that reduce operational toil and accelerate resolution rather than adding noise.
From Firefighting to Orchestration: The Goal of Modern Incident Management
Modern incident management has two main goals:
- Reduce operational toil – Minimize the repetitive, manual, and error-prone tasks that slow responders down (e.g., hunting for the right on-call, manually sending status updates, copying info between tools).
- Accelerate incident resolution – Help teams detect, triage, and remediate issues faster, while keeping stakeholders aligned and customers informed.
Think back to the paper switchboard on the wall:
- If someone needs the database team, they should be able to glance at the wall and see exactly who to call.
- If the payments API is down, they should see right away which systems depend on it.
- If leadership asks, “What’s going on?” there should be a clear, up-to-date incident summary visible to everyone.
Modern tooling is essentially making that wall digital, adaptive, and integrated. It pulls in signals from monitoring, connects to your on-call schedules, creates incident rooms, and helps you orchestrate the flow of work and communication.
On-Call Management as the Patch Panel of Incident Command
Traditional telephone switchboards used patch cables to connect callers. In incident management, on-call management and integration platforms (like xMatters, PagerDuty, Opsgenie, etc.) are your modern patch panels.
They sit at the center of your incident workflow and:
- Route alerts from monitoring tools to the right on-call engineer or teams
- Automate engagement, escalating until someone responds
- Trigger playbooks and workflows, such as creating incident tickets, Slack/Teams channels, or Zoom bridges
- Coordinate multi-team response, ensuring the right mix of specialists is brought together quickly
Instead of fumbling around asking, “Who’s on call for this?” your platform acts like an operator at the switchboard, instantly connecting the right people and systems. This reduces the initial chaos and frees responders to focus on diagnosis and remediation rather than logistics.
Roles: Labeling the Board So No One Gets Lost
In a busy incident room, chaos doesn’t come from a lack of talent—it comes from a lack of clarity. If everyone is trying to do everything, you get duplication, confusion, and missed steps.
That’s why clearly defined roles are crucial. Think of them as labeled sections on your paper wall, so everyone knows which part they’re responsible for.
Common roles include:
- Incident Commander (IC) – Owns overall coordination, decision-making, and prioritization. The IC doesn’t fix the issue; they manage the response.
- Technical Lead(s) – Own one or more affected systems or domains. They guide diagnosis and remediation efforts.
- Communications Lead – Manages internal and external updates, ensuring messaging is accurate, timely, and audience-appropriate.
- Scribe / Incident Recorder – Tracks timelines, decisions, and key events for post-incident analysis.
In your tooling, these roles can be captured explicitly:
- Tagged in the incident channel (e.g.,
@ic,@comms-lead) - Recorded in the incident ticket
- Reflected in your on-call system (e.g., a dedicated IC rotation)
The aim is simple: at any moment, anyone should be able to answer “Who’s in charge? Who’s fixing what? Who’s talking to whom?” without guesswork.
Communication Cadence: Signals on the Wall, Not Just in the Room
An incident isn’t just a technical event—it’s also a communication event. Internal teams, leadership, and customers all experience it differently, but they share the same anxiety: What is happening, and what does it mean for me?
Effective incident communication depends on two things:
1. Timely, Regular Updates
Silence creates confusion and speculation. Even if you don’t have all the answers, you can still:
- Acknowledge the issue
- Share what you do know
- Outline your next steps
- Commit to the next update time
It’s often better to say, “We’re investigating; next update in 15 minutes” than to wait until you have a complete root cause.
You can automate parts of this:
- Predefined update cadences (e.g., every 15–30 minutes for major incidents)
- Templates for internal and external notifications
- Auto-populated status pages from incident tooling
Think of these as status cards pinned to the wall: always visible, always refreshed.
2. Audience-Aware Messaging
Not everyone needs the same level of detail. Tailoring communication prevents overload while maintaining trust.
-
Internal technical teams need:
- Error rates, latencies, logs, and metrics
- Hypotheses under investigation
- Concrete next actions and owners
-
Business stakeholders and leadership need:
- Impact on customers and revenue
- Estimated time to mitigation or workaround
- Risk and decision points (e.g., rollback, feature flags)
-
Customers need:
- Clear acknowledgment of the issue
- Plain-language description of impact
- Reassurance that it’s being worked on
- Honest expectations around timelines
In your incident platform, that might look like:
- One technical incident room (Slack/Teams) for deep problem solving
- One or more stakeholder channels or email lists for summaries
- A public status page for customers
Each is a different pane on your metaphorical switchboard wall—drawing from the same underlying reality but expressed at different levels of abstraction.
Dependency Graphs: String, Clips, and Blast Radius
In our paper switchboard analogy, dependency graphs are the web of string connecting systems and services. When one card (service) is pulled down, you can immediately see which other cards are tugged out of place.
In real incidents, understanding these dependencies is often the difference between guessing and making informed decisions.
A good dependency graph helps you:
- See which systems depend on the failing component (blast radius)
- Prioritize actions based on customer and business impact
- Identify safe levers (e.g., feature flags, traffic shaping, failover targets)
- Sequence remediation steps to avoid cascading failures
During an incident, this visualization might answer questions like:
- “If the payments gateway is degraded, which regions or products are actually impacted?”
- “If we rate-limit Service A, what happens to Service B and C?”
- “If we roll back this deployment, do we break a dependency that expects the new API?”
The most powerful setups integrate dependency graphs directly into your incident tooling:
- Clicking an incident shows affected services and their owners
- On-call routing uses the graph to engage all relevant teams
- Dashboards overlay real-time health metrics on top of dependencies
In effect, you’re turning that chaotic ball of string into a navigable map you can actually reason with.
Putting It All Together: A Cohesive Incident Command System
When you combine these pieces—on-call management, clear roles, thoughtful communication, and dependency graphs—you get something greater than the sum of its parts: a coherent incident command system.
In practice, that might look like this for a major incident:
-
Detection & Routing
Monitoring spots an anomaly and triggers an alert. Your on-call platform routes it to the right team and opens an incident channel, ticket, and bridge. -
Role Assignment
The first responder self-assigns as Incident Commander or pages the designated IC. Communications and technical leads are identified and recorded. -
Dependency-Aware Triage
The team pulls up dependency graphs to assess blast radius. They identify which services, regions, and customers are impacted and prioritize mitigation steps. -
Structured Communication
- IC coordinates internal technical work
- Comms lead sends regular updates to stakeholders and customers, using tailored templates and cadences
- A scribe records key events and decisions
-
Iterative Mitigation & Resolution
Guided by the dependency map and real-time data, teams roll back changes, apply workarounds, or shift traffic. All actions and their effects are visible in the incident tooling. -
Closure & Learning
Once stable, the incident is closed, timelines reviewed, and insights captured. Automation can attach logs, metrics, and chat transcripts for post-incident reviews.
The result is not just faster resolution, but less cognitive load and less operational toil. People spend more time solving the problem and less time figuring out who to talk to, what’s connected to what, and how to keep everyone informed.
Conclusion: Build the Wall First, Then Automate It
The paper switchboard isn’t a nostalgia exercise—it’s a design blueprint.
Before chasing the newest tool or automation, ask:
- If this were a wall in a room, what would need to be visible at all times?
- Who would be on which cards? What string would connect what?
- Where would you pin the current incident? Where would you write updates?
Once you can sketch that on paper, you can look to platforms like xMatters and other incident tools to bring that model to life, integrating:
- On-call and escalation workflows
- Role assignment and coordination
- Audience-specific communication
- Dynamic dependency graphs and impact views
The future of incident management isn’t about more dashboards for their own sake. It’s about orchestrating a clear, shared understanding of reality—who’s involved, what’s broken, what depends on what, and how to move from chaos to control.
Whether with string and clips or APIs and webhooks, the goal is the same: a living switchboard that turns complex incidents into manageable, coordinated responses.