Rain Lag

The Analog Incident Compass Box: Building a Pocket-Sized Calm Kit for Your Worst On-Call Nights

How to design a pocket-sized, analog "incident compass" using engineering principles to stay calm, reduce cognitive load, and communicate clearly during your worst on-call shifts.

The Analog Incident Compass Box: Building a Pocket-Sized Calm Kit for Your Worst On-Call Nights

On-call work in software engineering sits at a strange intersection. Most days, programming is quiet, methodical, even meditative. Then, at 2:37 a.m. on a Sunday, your phone explodes with alerts, dashboards turn red, and suddenly your heart rate looks like a DDoS graph.

This is the reality of modern software: both relaxed and deeply stressful, with on-call rotations acting as pressure cookers that concentrate the worst of that stress into a few unreasonably intense hours.

In this post, we’ll design a physical, analog "Incident Compass Box"—a pocket-sized calm kit for your worst on-call nights—using the same engineering principles we already trust: natural science, math, and design processes. The idea is simple: if we can engineer resilient distributed systems, we can also engineer calmer, more resilient humans inside those systems.


Why an "Analog" Calm Kit?

Digital tools are fantastic—until they aren’t. In a high-stress incident, your brain is overloaded, Slack is noisy, dashboards are laggy, and context is fractured across tabs.

An analog calm kit is deliberately low-tech:

  • It doesn’t crash.
  • It doesn’t send you notifications.
  • It lives on your desk, in your bag, or in the incident room.
  • It works when your brain is running at 20% capacity.

Think of it as a pocket-sized incident control panel for your nervous system and your decisions. Its core purpose: reduce cognitive load, clarify expectations, and give you something solid to lean on when everything feels fuzzy.


Designing Personal Systems Like We Design Software

We already know how to design robust systems. We:

  • Model the system.
  • Identify failure modes.
  • Reduce single points of failure.
  • Introduce guardrails and defaults.
  • Iterate based on feedback and post-incident data.

Your on-call self is just another system under load. Apply the same principles:

  1. Natural science: Your brain and body are biological systems that behave predictably under stress—heart rate goes up, working memory goes down, decision quality suffers.
  2. Math: Context switches, decision branches, and notifications have additive and multiplicative effects on mental load.
  3. Design processes: You can prototype, test, and refine your personal coping systems just like any UX flow.

Your incident compass box is the physical artifact of this thinking: a designed system to guide you when your cognitive "CPU" is throttled.


Core Goals of the Incident Compass Box

Before we list what goes inside, we set design goals. Any calm kit for on-call incidents should focus on:

  1. Reducing cognitive load
    Minimize what you need to remember or decide under stress. Move as much as possible into checklists, prompts, and defaults.

  2. Preventing alert fatigue
    Help you distinguish between noise and signal, and remind you that not every alert is a full-blown incident.

  3. Clarifying communication
    Reduce anxiety by making expectations about communication simple and explicit.

  4. Offloading decisions to runbooks
    Let well-maintained runbooks make the routine choices, so you can focus on the novel or ambiguous parts.

  5. Supporting continuous improvement
    Capture what you learned emotionally and technically, and feed it back into the system so the next incident is easier.


What Goes Into the Analog Incident Compass Box?

You can start with a small pencil case, metal tin, or index card box. Inside, you’ll keep a set of cards, checklists, and prompts. Here’s a suggested "v1".

1. The First 5 Minutes Card

A single index card labeled: "First 5 Minutes". This is your boot sequence. For example:

  1. Breathe (30 seconds)
    • In 4 seconds, hold 4 seconds, out 6–8 seconds.
    • Repeat 4 times.
  2. Stabilize your context
    • Plug in laptop & connect to stable network.
    • Open incident channel / bridge.
    • Confirm you have access to monitoring & logs.
  3. Name the incident
    • Create or join the incident in your tooling.
    • Write a one-line summary: "X appears broken for Y users".
  4. Declare roles (if needed)
    • Incident commander, communicator, primary engineer.

This is not about being perfect; it’s about not forgetting the basics when adrenaline hits.


2. Communication Checklists: Anxiety-Reduction by Clarity

Communication is where a lot of incident anxiety lives: Who should I update? What do I say? Am I doing enough?

A simple communication checklist can dramatically reduce that anxiety.

Create a small card or two with:

Incident Communication Checklist

  • Is an incident channel/room created and named?
  • Is there a clearly identified incident commander?
  • Have key internal stakeholders been notified? (SRE, support, leadership, etc.)
  • Has a status update been posted to the incident channel?
  • Is there a recurring update cadence? (e.g., every 15–30 minutes)
  • Is someone responsible for external communication (status page, customer success)?

On the back, keep message templates:

Kickoff message
"We are investigating an incident affecting [scope]. Current impact: [what users see]. We’re in [channel/bridge]. Next update by [time]."

Update message
"Update [time]: We are [investigating / testing a fix / rolling back]. Impact remains [unchanged / improving / worsening]. Next update by [time]."

These templates make expectations concrete and prevent the mental spiral of "I should say something, but I don’t know what."


3. Runbook Reminder Cards: Trust the System

Your runbooks live in Confluence, Git, or an incident tool—but your brain needs to remember to use them.

Add a brightly colored card:

"Check the Runbooks" Card

Front:

  • Search for an incident / runbook related to:
    • The affected service.
    • The error pattern or alert name.
  • If a relevant runbook exists, follow it step-by-step before improvising.

Back:

  • If the runbook is outdated or missing:
    • Jot a quick note.
    • After the incident, add or update the runbook during the review.

Well-maintained incident runbooks are not just operational tools; they are psychological safety nets. The existence of a tested path reduces decision fatigue and gives you permission to follow a known route instead of reinventing the wheel.


4. Triage & Decision Cards: Reducing Cognitive Branching

During incidents, decision trees proliferate: should we roll back? scale up? page more people?

Create a triage card with simple decision triggers, based on your org’s SLOs and policies:

Triage Cheat Sheet

  • Is this impacting production customers?
    • If yes → Treat as P1/P2, follow major incident flow.
  • Is data at risk?
    • If yes → Immediately escalate to security / data owners.
  • Is there a safe rollback?
    • If yes and recent change suspected → Prefer rollback over complex forward-fix.
  • Are we exceeding SLOs?
    • If yes → log breach for post-incident review.

This is just applying bounded decision-making: limit branches, set clear thresholds, and use defaults where possible.


5. Personal Coping Prompts: Engineering Your Own Nervous System

Stress reactions are physical. You can treat them systematically, too.

Include a small card labeled "When I’m Overwhelmed".

Examples:

  • Take 60 seconds away from keyboard, stand up, drink water.
  • Ask explicitly: "Can someone else take over as incident commander / note taker for 10 minutes?"
  • Use a brief grounding exercise: name 5 things you see, 4 you feel, 3 you hear, 2 you smell, 1 you taste.
  • Remind yourself: "Incidents are a system property, not a personal failure."

This is where natural science meets design: understanding your physiological limits and building small control loops to stay within them.


6. Post-Incident Review Cards: Continuous Improvement for Both Systems and Humans

Your calm kit only gets better if you close the loop.

Create a Post-Incident Reflection card with two sides:

Technical Reflection

  • What surprised us technically?
  • Which runbooks helped or failed?
  • Where did tooling slow us down or save us?
  • What should we update in alerts, dashboards, or runbooks?

Emotional / Process Reflection

  • When did I feel most overwhelmed? Why?
  • What helped me feel safer or more in control?
  • Which checklist, prompt, or tool actually helped?
  • What one change could make the next incident easier for me?

These notes then inform:

  • Updates to the analog kit (new cards, revised prompts).
  • Updates to digital tooling (better alerts, runbooks, automation).
  • Team practices (on-call handoffs, rotations, shadowing, training).

This is incident management as a living engineering project, not a static ritual.


Learning From Past Patterns: Smarter Tools, Calmer Humans

Your analog box is your physical safety net, but your digital systems can also learn.

Incident tools and internal platforms can:

  • Suggest likely runbooks based on similar past incidents.
  • Auto-fill incident timelines with relevant events.
  • Recommend who to page based on historical resolution patterns.
  • Surface prior incident reports for similar symptoms.

The effect is psychological as much as operational: "We’ve seen this before. We know how to handle it."

This combination—analog prompts for the human, digital intelligence for the system—creates a powerful sense of preparedness instead of panic.


Bringing It All Together

On-call will probably never be completely calm. Systems fail in surprising ways, traffic spikes at the worst times, and your sleep will occasionally lose a fight with a pager.

But calm is not the absence of incidents; it’s the presence of good systems—for your code and for yourself.

By building an Analog Incident Compass Box, you:

  • Externalize the most important steps into checklists and prompts.
  • Lean on runbooks to offload routine decision-making.
  • Use communication templates to reduce anxiety and clarify expectations.
  • Treat cognitive load and alert fatigue as first-class design constraints.
  • Grow a culture of continuous improvement, where both tooling and emotional coping strategies get better after every incident.

You don’t need the perfect kit to start. A few index cards with your first 5 minutes, communication checklist, and "When I’m overwhelmed" prompts are enough for a v1.

From there, iterate—like any good engineer would.

The next time your phone lights up at 2:37 a.m., you’ll still be tired. You may still feel that little spike of panic.

But you’ll also have a compass.

And that can make all the difference.

The Analog Incident Compass Box: Building a Pocket-Sized Calm Kit for Your Worst On-Call Nights | Rain Lag