Rain Lag

The Paper-Only Incident Campfire: Telling Outage Stories Instead of Running Status Meetings

How to replace rigid outage reviews with a paper-only story circle that uses SRE-style postmortems, narrative, and psychological safety to turn incidents into powerful shared learning moments.

The Paper-Only Incident Campfire: Telling Outage Stories Instead of a Status Meeting

If your outage reviews feel like awkward courtroom proceedings where everyone is afraid to speak, you’re not alone. Many teams try to “do postmortems” and end up with ritualized status meetings that produce timelines, tickets, and action items—but very little real learning.

A different model is emerging: the paper-only incident campfire.

Instead of a rigid, slide-driven status meeting, you gather in a circle—physically or virtually—to tell the story of the incident from the people who lived it. No screens, no dashboards, no JIRA boards. Just a written postmortem, printed (or shared), and read together like a story, then discussed as a group.

This simple shift in format, grounded in SRE-style postmortems and psychological safety, can transform how your team thinks about failure.


Why Traditional Outage Reviews Don’t Work

Many post-incident reviews fail for predictable reasons:

  • They’re run like performance evaluations instead of learning sessions.
  • Leaders subtly (or openly) signal that they’re hunting for “the mistake.”
  • Attendees fixate on proving they did the right thing, not on exploring what actually happened.
  • The conversation is dominated by technical minutiae and status checklists, not human decision-making.

In this environment, people:

  • Withhold details that might make them look careless.
  • Quietly avoid mentioning uncertainty or confusion.
  • Rewrite the story in hindsight to make decisions look more rational and less risky.

The result: a clean-looking document that hides the messy reality of how incidents really unfold—and a system that is no safer next time.


The SRE Foundation: Blameless, Systematic Postmortems

Before we get to campfires, we need solid fuel: SRE-style postmortems.

Done well, SRE postmortems:

  • Systematically document the incident: what happened, when, who was involved, what users saw, what changed.
  • Uncover root causes (plural): not just “the bug,” but contributing conditions, missing signals, process gaps, and misaligned incentives.
  • Design changes to prevent recurrence: technical fixes, runbook updates, better alerting, training, process changes.

Critically, they’re blameless.

Blameless doesn’t mean “nobody is ever accountable” or “we ignore negligence.” It means:

We assume every person involved was doing their best with the information, tools, and constraints they had at the time.

Instead of asking, “Who messed up?” you ask:

  • What made this action reasonable at the time?
  • What information was missing or misleading?
  • What pressures (time, load, expectations) shaped decisions?

This framing shifts focus from individuals to systems, making it possible to learn.


From Status Meeting to Story Circle

Most incident reviews revolve around slides, dashboards, and tickets. The campfire format flips that.

What is a “Paper-Only Incident Campfire”?

It’s a narrative review ritual with three key properties:

  1. Paper-first (or text-first): The SRE-style postmortem is written before the meeting and shared as a narrative. During the campfire, you read and discuss it—no status decks, no live log spelunking.
  2. Story circle: Everyone gathers at the same level (chairs in a circle if you’re in person; cameras on and equal speaking time if you’re remote). The emphasis is on telling and understanding the story of the incident.
  3. Conversation over presentation: The goal isn’t to present what happened, but to explore how it unfolded—especially the human decisions and surprises.

The campfire is where the postmortem becomes a shared learning artifact, not just a PDF in a drive.


Designing the Narrative Postmortem

To support this format, your postmortem should read more like a story than a checklist.

Include the usual SRE elements:

  • Summary and impact
  • User-visible symptoms
  • Timeline of events
  • Root causes and contributing factors
  • Remediations and follow-ups

But expand the narrative around human decision-making:

  • What did people see or believe at each stage?
  • What cues were confusing, ambiguous, or missing?
  • What trade-offs were teams juggling (e.g., speed vs. safety)?
  • Where did assumptions or mental models diverge from reality?

This narrative becomes the script for your campfire.


Running the Campfire: Step-by-Step

Here’s a practical structure you can use.

1. Set the Frame Explicitly

The facilitator (often an SRE or incident manager) opens with a clear statement:

  • This is a learning session, not a performance review.
  • We are not here to assign blame.
  • We are here to understand: “How did this make sense at the time?”

Reinforce that no new corrective actions will be assigned on the spot. That reduces the sense that speaking up may create instant extra work or personal consequences.

2. Read the Story Together

Have someone (often the incident lead) read the narrative sections of the postmortem aloud or walk through them slowly, pausing:

  • at key decision points,
  • where things were confusing,
  • where timelines sped up or slowed down.

Invite people who were there to add color:

  • “What were you seeing on your screen?”
  • “What did you think was happening then?”
  • “What did you feel most uncertain about at that point?”

The goal is to reconstruct the lived experience, not just the log of events.

3. Ask Curiosity-Driven Questions

Frame questions to spark curiosity, not judgment. Instead of:

  • “Why did you restart the service without a canary?”

try:

  • “What made the restart seem like the best option at the time?”
  • “What signals suggested it would be safe?”
  • “Were there constraints or pressures we aren’t seeing in the timeline?”

This language matters. It signals that you expect reasonable behavior in a complex system, not perfection.

4. Explore the System, Not the Person

When you find an “error,” keep going:

  • What training would have changed this decision?
  • What documentation was missing or outdated?
  • What tooling could have surfaced the risk earlier?
  • What cultural expectations nudged people toward speed vs. caution?

Map these to systemic changes, not admonitions like “Be more careful next time.”

5. Validate and Evolve the Response Plan

Use the story to evaluate not just the failure, but also the response process:

  • Did alerts fire when and where they should have?
  • Did ownership and on-call paths work as expected?
  • Were communication channels (Slack, Zoom, incident room) effective?
  • Did the runbook match reality, or did people improvise?

The postmortem becomes a mirror for your incident response plan. If people repeatedly deviated from the playbook to succeed, the playbook—not the people—needs updating.


Psychological Safety: The Critical Ingredient

Simply changing the format is not enough.

You can sit in a circle with printed postmortems and still end up with:

  • people self-censoring,
  • leaders subtly signaling disapproval,
  • engineers gaming the narrative to minimize their exposure.

To make the campfire work, you must actively reduce fear of repercussion, failure, and judgment.

Some concrete practices:

  • Leaders go first. Have managers and senior engineers openly share their own mistakes and near-misses.
  • Ban “who did this?” questions. Refocus on “what conditions led here?”
  • Protect against weaponization. Make it explicit that postmortem content is not used in performance reviews.
  • Normalize uncertainty. Praise people for highlighting confusion, not just confident actions.

Over time, this builds a culture where admitting mistakes and asking basic questions is safe—and even expected.


Turning Campfire Stories into Lasting Artifacts

The outcome of a campfire isn’t just a feel-good conversation. It’s a refined artifact the whole organization can learn from.

After the session:

  1. Update the postmortem with insights from the discussion.
  2. Highlight key narrative moments: surprising signals, ambiguous cues, coordination challenges.
  3. Capture agreed systemic changes and hypotheses ("We believe this change will reduce detection time by X").
  4. Share broadly: in an internal knowledge base, learning newsletter, or SRE guild.

Over time, these narrative postmortems become:

  • Onboarding material for new engineers.
  • Realistic examples for training and game days.
  • A visible record of your evolution in reliability and culture.

Bringing the Campfire to Your Team

You don’t need a full cultural overhaul to start. Try this for your next significant incident:

  1. Write a blameless, narrative-style postmortem.
  2. Schedule a 60-minute “incident campfire” with all key participants and stakeholders.
  3. Explicitly frame it as a story circle and learning session.
  4. Run the format once, then gather feedback.

Ask people afterward:

  • Did you feel safe sharing what really happened?
  • Did you learn something you didn’t know before?
  • Did we generate changes that will actually improve our system or process?

Use that feedback to tune the ritual.


Conclusion: From Fear to Curiosity

Incidents are inevitable. Wasted incidents—those we rush past with shallow reviews and quiet blame—are optional.

The paper-only incident campfire is a simple but powerful way to:

  • Anchor your practice in SRE-style, blameless postmortems.
  • Shift from status reporting to storytelling.
  • Center human decision-making as a legitimate, examinable part of the system.
  • Build psychological safety, so people share what really happened.
  • Turn each outage into a shared learning artifact that makes your team and systems stronger.

When you sit in a circle, put the paper in the middle, and tell the story together, you stop asking, “Who failed?” and start asking, “What can we learn?”

That’s where real reliability begins.

The Paper-Only Incident Campfire: Telling Outage Stories Instead of Running Status Meetings | Rain Lag