Rain Lag

The Analog Incident Compass Garden Shed: A Paper Toolbench for Quietly Tuning Your On‑Call Rituals

How to design humane, reliable on‑call systems using paper, pencils, and a few simple rules—so incidents are handled quickly and teams stay connected instead of burnt out.

The Analog Incident Compass Garden Shed

A Paper Toolbench for Quietly Tuning Your On‑Call Rituals

Most teams treat on‑call as a digital problem: more dashboards, more bots, more rules, more tools. But the real friction rarely comes from missing software. It comes from missing clarity, missing rituals, and missing human connection.

Think of your incident process less like a high‑tech control room and more like a small garden shed: a quiet place at the edge of your operational “garden,” where you keep the simple tools that make everything else work better.

This post explores how to build that shed: an Analog Incident Compass—a paper toolbench for designing and tuning your on‑call rituals. You’ll walk away with:

  • A simple, robust on‑call rotation pattern
  • Clear escalation rules with fixed, short windows
  • Leadership involvement at the right time, not too late
  • Practices for iterating quietly without drama
  • Analog rituals that counter digital overload
  • Ways to make on‑call feel supported instead of isolating

Why an “Analog Incident Compass”?

When incidents hit, cognitive load explodes:

  • Multiple dashboards
  • Pager notifications
  • Chat threads
  • Status pages
  • Stakeholders asking for updates

It’s easy to add even more digital systems in the name of reliability. But more screens don’t automatically mean better decisions.

Analog tools—paper, whiteboards, sticky notes—work because they are slow, simple, and finite. They help you:

  • Externalize complexity: what’s in the notebook is not in your head
  • Make rules visible: “this is how we escalate” is written down, not implied
  • Tune processes quietly: erase, rewrite, adjust over time

Your Analog Incident Compass is a small, shared, physical set of artifacts that keep your on‑call system humane and understandable—especially under stress.


1. Design the Rotation: Last Week’s Primary Becomes This Week’s Secondary

The first tool in your shed is the rotation pattern. A small change here can massively improve your incident outcomes.

The Pattern

Use this simple rule:

Last week’s primary is this week’s secondary.

So if you have a rotation like:

  • Week 1: Primary = Alex, Secondary = Bailey
  • Week 2: Primary = Bailey, Secondary = Alex
  • Week 3: Primary = Casey, Secondary = Bailey

…then the person who just finished holding the pager remains in the loop as backup. They still have context:

  • Recent incidents and their root causes
  • Known flaky systems and partial fixes
  • Work‑in‑progress mitigations

This design means escalations are faster and smoother, because the secondary isn’t cold—they’re warm with fresh memories.

How to Make It Analog

Create a paper rotation calendar:

  • One page per quarter
  • Columns for: Week, Primary, Secondary, Manager On‑Duty
  • Pencil, not pen—you will adjust this

Put it on a wall or in a shared notebook. During team meetings, physically point at it when discussing upcoming on‑call. That tiny ritual reinforces shared ownership and visibility.


2. Draw the Paging Chain: Primary → Secondary → Engineering Manager

On‑call chaos often comes from one simple gap: no one is quite sure what happens if the first person doesn’t respond.

Fix that with a clear paging chain.

The Chain

Define a single, visible sequence:

  1. Primary — first responder, owns triage
  2. Secondary — warm backup, steps in on escalation
  3. Engineering Manager (or equivalent leader) — ensures ownership, support, and stakeholder communication

Write this as a simple flow on paper:

Incident occurs → Page Primary → (if no ack) → Page Secondary → (if still no ack) → Page EM

Time Windows (Short and Explicit)

The chain only works if each step has a short, explicit time window. No ambiguity.

Example:

  • Primary: 0–5 minutes to acknowledge
  • Secondary: 5–10 minutes to acknowledge
  • Engineering Manager: 10–15 minutes to take ownership and coordinate

You can tune these numbers, but they must be:

  • Written down
  • Communicated to everyone
  • Reflected in your alerting tool

How to Make It Analog

On a single sheet of paper, draw:

  • A vertical flow from Incident down to Stable / Owned
  • Each node labeled with: Role, Time Window

Post this where you do incident reviews. That way, every discussion of “what went wrong” is tethered to “what should have happened” in a single glance.


3. Fix the Acknowledgment Window: 5 Minutes Max

Wobbly acknowledgment rules (“I’ll get to it when I see it”) quietly kill reliability.

Adopt a fixed acknowledgment window:

Primary has 5 minutes to acknowledge the page before it auto‑escalates.

This does not mean the incident must be fixed in 5 minutes—it just means a human must say “I see this and I’m on it.”

Why this matters:

  • It removes guesswork: no arguments about “how long we should wait”
  • It protects customers: the system never waits around hoping someone is awake
  • It protects responders: clear expectations make boundaries easier

How to Make It Analog

In your Incident Compass notebook, dedicate a spread called “Pager Promises”:

On the left page, write:

  • Primary: 5 minutes to acknowledge
  • Secondary: 5 minutes after primary fails
  • EM: 5 minutes after secondary fails

On the right page, leave space to note during post‑incidents:

  • Did we meet these? Y/N
  • If not, what small tweak might help? (e.g., backup phone, different tool, rotation adjustment)

This keeps attention on behavior and improvement, not blame.


4. Cap Total Response Time: 15 Minutes to Leadership Involvement

Some incidents don’t need leadership. But when they do, the worst outcome is late involvement: hours of confusion, no clear owner, and mounting customer pain.

Define a maximum total response time:

If no one has acknowledged and taken ownership within 15 minutes, leadership is automatically involved.

This doesn’t mean escalating every minor blip. It means:

  • If the system cannot confirm someone owns the incident within 15 minutes, that is itself an incident
  • Leadership’s role is to restore ownership, not to fix the technical issue

They might:

  • Reassign responders
  • Notify stakeholders
  • Make prioritization calls
  • Decide whether to page other teams

How to Make It Analog

On a bright‑colored card (index card works well), write in large letters:

“No unowned incident after 15 minutes.”

Pin this near your team’s workspace or camera. It’s a simple, constant reminder: our commitment is not perfection, it is ownership.


5. Treat On‑Call as a Ritual You Quietly Tune

On‑call should not be a fixed, painful law of the land. It should be treated as a ritual—something you refine over time with care and attention.

Rituals are:

  • Intentional
  • Repeatable
  • Reflective

Adopt a cadence of quiet tuning:

  • After each significant incident, add one note to your Incident Compass notebook:
    • What worked well?
    • What felt confusing?
    • What one small rule change might help?
  • Once a month, review those pages and choose one change to test

Changes could include:

  • Adjusting rotation length (1 week vs 2 weeks)
  • Clarifying escalation for particular services
  • Updating who the EM backup is on weekends

The key is: small adjustments, often.

How to Make It Analog

Create a dedicated section in your notebook called “Ritual Experiments.” For each experiment, write:

  • Name: “5‑minute EM heads‑up”
  • Start date / End date
  • What we changed
  • What we observed
  • Keep / revert / adjust

Over time, this becomes a log of your team’s evolving wisdom—not just a record of outages.


6. Use Low‑Tech Rituals to Counter Digital Overload

Digital tools are necessary. But they’re not sufficient, and they often create extra noise.

Complement them with low‑tech rituals that keep your nervous system calm:

  • Pre‑shift paper check‑in (5 minutes)
    Before your on‑call week starts, fill one page:

    • “What systems worry me most this week?”
    • “What runbooks should I skim today?”
    • “Who can I ask for help quickly?”
  • Single‑page incident log
    During an incident, write by hand:

    • Time, Event, Decision, Next Check This reduces context thrash and gives you a sanity anchor.
  • Post‑incident reflection card
    After a major incident, give the primary and secondary each a small card:

    • One thing that made this easier
    • One thing that made this harder Collect and review these monthly.

These rituals are intentionally small. They’re not extra bureaucracy; they are emotional and cognitive guardrails.


7. Build Team Connection Into On‑Call

The worst on‑call systems make responders feel isolated and blamed. The best make them feel supported, prepared, and connected.

Design for connection explicitly:

  • Buddy intros at rotation handoff
    When roles change, primary and secondary do a 10‑minute sync:

    • Review last week’s incidents
    • Share “watch out for…” notes
    • Confirm contact preferences
  • Manager as supporter, not judge
    When managers enter the escalation chain, their first question should be:

    • “How can I help?”
      Not “Why did this happen?”
  • Shared ownership of runbooks
    Keep a physical folder or binder of critical runbooks. Once a quarter, pair people up to walk through one runbook together and mark what’s outdated.

These practices send a clear message: being on‑call means you are trusted and backed, not alone.


Conclusion: Step Into the Garden Shed

Reliability isn’t just about dashboards and SLOs. It’s about clear roles, short response windows, and humane rituals that help humans stay grounded when things go wrong.

By building an Analog Incident Compass—a small, paper‑based toolbench—you can:

  • Design rotations that preserve context (last week’s primary becomes this week’s secondary)
  • Make escalation rules visible and time‑bound (Primary → Secondary → EM with fixed windows)
  • Guarantee ownership within a maximum response time (e.g., 15 minutes)
  • Quietly tune your on‑call rituals over time, instead of swinging between extremes
  • Ground your team with low‑tech practices in a high‑tech environment
  • Turn on‑call from a lonely burden into a shared, supported responsibility

You don’t need a new platform to start. You need a notebook, a pencil, and a team willing to step into the metaphorical garden shed and ask:

“What tiny change could make the next incident a little clearer, a little kinder, and a little more reliable?”

Start there. Write it down. Tune quietly. Your future, calmer on‑call self will thank you.

The Analog Incident Compass Garden Shed: A Paper Toolbench for Quietly Tuning Your On‑Call Rituals | Rain Lag