Rain Lag

The Cardboard Reliability Ticket Booth: Selling Tiny Time Slots for Practicing Calm Incidents

How to use ITIL-style tickets, tiny timeboxes, and radio-drill habits to build a culture of calm, reliable incident response—before the real emergencies hit.

The Cardboard Reliability Ticket Booth: Selling Tiny Time Slots for Practicing Calm Incidents

Modern reliability work lives in paradox: we want calm, methodical incident response—but we expect teams to learn that calm only in the middle of real fires.

There’s a better way: build a Cardboard Reliability Ticket Booth.

Not a literal cardboard kiosk (though you could make one!), but a simple metaphor:

  • You “sell” your team tiny time slots—small, well-bounded windows—to practice calm incidents.
  • Those calm incidents are logged and tracked like real ITIL tickets.
  • Communication is scripted and rehearsed like radio drills.

Instead of waiting for production to teach people how to respond under pressure, you create a safe, structured market for practicing those skills in miniature.

In this post, we’ll explore how to:

  • Use ITIL-aligned ticket types to model practice incidents
  • Treat practice like real tickets (with SLAs, comms, and closure notes)
  • Borrow radio-drill style training for communication
  • Equip teams with reliable tools for realistic rehearsal
  • Use small timeboxes as a risk-management technique
  • Make time constraints a first-class design element of practice

Why “Calm Incidents” Belong in Your Ticketing System

Most teams treat practice as something informal: a lunchtime chaos game, a side exercise during retros, or “we’ll run a drill if we have time.”

The result: practice is the first thing cut when calendars get crowded—and incident response skills only improve in the most expensive possible context: real outages.

Instead, treat calm incidents as first-class operational work:

  • They get tickets.
  • They have owners, priorities, and statuses.
  • They show up in reports and dashboards.

This is where ITIL-style structure helps.

Map Calm Incidents to ITIL-Aligned Ticket Types

Depending on your organization’s ITIL maturity, you can align practice scenarios like this:

  • Incident: A simulated service disruption (e.g., “Checkout latency increased by 500ms”).
  • Problem: A follow-up practice ticket to analyze root causes of the simulated incident.
  • Change: A practice rollback, failover, or configuration change associated with the scenario.
  • Service Request: A scheduled calm-incident drill requested by a team or manager.

Example:

  • Ticket type: Incident (Practice)
  • Summary: [DRILL] Payment Service Timeout Under Peak Load
  • Linked tickets: Problem (Practice Post‑Incident Review), Change (Practice Failover to Region B)

This lets you:

  • Track practice like real work.
  • Report on participation, frequency, and improvement.
  • Build operational habits that transfer directly to production.

Treat Practice Scenarios Like Real Tickets

If you want calm during real events, practice must look and feel real enough:

  1. Use the same ticketing system as production incidents.

    • Same forms, fields, and workflows.
    • Same severity levels (with a "practice" tag or environment flag).
  2. Assign real roles:

    • Incident Commander
    • Communications Lead
    • Technical Lead(s)
    • Scribe / Note-taker
  3. Follow realistic workflows:

    • Declare the incident.
    • Spin up a call/bridge or chat channel.
    • Communicate status updates on your usual channels.
    • Record timelines, actions, and impact—just like the real thing.
  4. Close the ticket with real artifacts:

    • Summary of what happened in the scenario.
    • What went well (tech + comms).
    • What felt confusing or slow.
    • Clear follow-up tasks.

The goal isn’t to fake an outage; it’s to rehearse how you behave when stakes are high, using the same muscle memory you’ll rely on later.


Borrow from Radio Drills: Scripted, Repeated Communication Practice

Emergency responders don’t rely on “winging it” in crises. They run radio drills: short, scripted exercises to practice clarity, brevity, and confirmation loops.

Tech teams can borrow this pattern.

Design Radio-Style Reliability Drills

Create tiny scripts for your calm incidents, focused on communication:

  • Opening declaration:
    • “This is a practice incident. Incident Commander: Alex. Scenario: Database latency spikes. Timebox: 10 minutes.”
  • Check-ins:
    • “IC to Tech Lead: what’s your current hypothesis?”
    • “Comms to IC: any update to customer-facing status?”
  • Handoffs:
    • “IC transferring command to Dana as new IC. Time is 14:05. Dana, please repeat your understanding.”

Keep these drills short and repetitive. The focus is not on solving complex technical puzzles; it’s on:

  • Speaking clearly under time pressure.
  • Confirming understanding (“read-backs”).
  • Avoiding jargon when communicating externally.

Run them often—5–15 minutes, weekly or bi-weekly—so that during real incidents, the phrases come out almost automatically.


Equip Teams with Realistic Communication Tools

You can’t practice good communication with unreliable tools. The more friction in the tooling, the more stress in the incident.

For realistic calm-incident drills, ensure:

  • Standard, agreed channels:

    • Primary: incident Slack/Teams channel or dedicated bridge.
    • Secondary: fallback channel if primary fails.
  • Reliable access:

    • Everyone knows how to join calls quickly.
    • Calendar events or ticket templates include the channel/bridge links.
  • Incidents dashboard or bot helpers:

    • A bot that creates channels, posts templates, and reminds people of roles.

If your org can use physical tools (like real radios or headsets) for on-prem operations, integrate them into practice:

  • Test audio quality, battery life, and coverage.
  • Practice concise radio-style messages.

For remote or hybrid teams, the “radio” is your chat + video stack. The principle is the same: make sure the tools are boringly reliable, so the practice focuses on people, not infrastructure.


Timeboxes as Risk-Management, Not Just Scheduling

The “cardboard booth” metaphor is about selling tiny time slots: you buy a small, predictable risk window where people can experiment and learn.

Timeboxing isn’t just a scheduling convenience; it’s a risk-management technique:

  • Risk of overrun: A real incident can consume hours; a calm incident is intentionally capped.
  • Risk of burnout: Short practice slots reduce the emotional load.
  • Risk of disruption: People know exactly what they’re committing to.

How to Use Timeboxes for Calm Incidents

  1. Define strict time limits per drill:

    • 5–10 minutes for basic comms drills.
    • 15–25 minutes for simple technical scenarios.
    • 30–45 minutes for complex, multi-team simulations.
  2. Treat the timebox as a hard constraint:

    • When time is up, the drill ends—even if the “incident” is unresolved.
    • Use the debrief to explore what happened when time ran out.
  3. Capture uncertainty:

    • How much time did it take to stabilize the simulated service?
    • Were estimates wildly off?
    • What would have happened if this were real?

This makes time visible as a variable with uncertainty, rather than an afterthought. Over multiple drills, you’ll build a better sense of how long different incident types really take to diagnose and mitigate.


Start Tiny: Selling Very Short Time Slots First

If you jump straight into hour-long game days, people will feel overwhelmed and resistant. Start micro.

Think of your Cardboard Reliability Ticket Booth as selling:

  • 5-minute tickets for basic comms (one tiny scenario, one clear handoff).
  • 10-minute tickets for a single, simple failure mode.
  • 15-minute tickets for adding a tech element (log dive, metrics check).

Benefits of starting small:

  • Lower psychological barrier: “I can spare 5 minutes” is an easier sell than “I’ll lose half my afternoon.”
  • Higher repetition: You can run many more reps, which is how skills solidify.
  • Clear focus: Each drill targets one skill—declaring, delegating, logging, or closing.

As confidence grows, you can lengthen or chain time slots: two 10-minute drills back-to-back, or a 20-minute main incident followed by a 10-minute debrief.


Time Constraints as a Core Design Element

In many training scenarios, the schedule is “soft”: if the practice runs over, people just… extend the meeting.

That sends the wrong signal. In reality, time is often the scarcest resource in an incident.

Design each calm-incident scenario around its time constraint:

  • Objective scoped to time:

    • “By minute 8, we must have a clear external status update drafted.”
    • “By minute 12, we must choose a mitigation, even if we’re not fully confident.”
  • Decision points tied to the clock:

    • “At minute 5, IC must decide whether to escalate to another team.”
  • Trade-offs surfaced:

    • “We can keep investigating cause, or we can roll back now. We have 3 minutes to choose.”

Avoid casually extending the drill. Instead:

  • Stop on time.
  • Debrief what it felt like to make decisions in that constraint.
  • Adjust future scenarios to calibrate difficulty, not to erase the time pressure.

Over time, people learn that time is real—and they get better at making thoughtful decisions even when the clock is loud.


Putting It All Together

A Cardboard Reliability Ticket Booth isn’t a big program or a fancy tool. It’s a set of simple design choices:

  • Use real ticketing systems to log calm-incident practice.
  • Align with ITIL ticket types so drills fit your existing workflows.
  • Borrow radio-drill patterns for clear, repeatable communication.
  • Equip teams with reliable communication tools and practice using them.
  • Timebox aggressively to manage risk and keep practice small and frequent.
  • Start tiny, then scale up as confidence grows.
  • Treat time constraints as central to scenario design—not an optional detail.

Do this consistently, and your team will arrive at real incidents with skills they’ve already rehearsed in dozens of tiny, controlled windows. The calm won’t be accidental; it will be practiced, logged, and continually refined.

And the cardboard booth? You can still build a literal one, if you like—just a small prop on the wall with sticky-note “tickets” for 5-, 10-, and 15-minute drills. Sometimes a bit of physical theater is exactly what teams need to remember: reliability isn’t only about preventing incidents.

It’s about practicing how to move through them calmly, one tiny time slot at a time.

The Cardboard Reliability Ticket Booth: Selling Tiny Time Slots for Practicing Calm Incidents | Rain Lag