The Cardboard Reliability Street Market: Swapping Hand‑Drawn Failure Maps Between Teams in One Afternoon

Most teams treat failures like household fires: scramble to put them out, clean up the mess, and move on as quickly as possible. The incident gets fixed, a retro is held, the doc is filed away in a wiki—and the real learning potential quietly evaporates.

There’s a better way.

Imagine turning every incident into a tangible learning artifact; something you can pick up, sketch on, argue over, and hand to another team. Imagine turning an ordinary afternoon into a buzzing “street market” of reliability experiences where teams trade those artifacts like vendors swapping trade secrets.

That’s the idea behind the Cardboard Reliability Street Market: a simple, low‑cost, high‑impact workshop where teams create and swap hand‑drawn failure maps in one afternoon.

Why Failure Deserves More Than a Quick Fix

Incidents are expensive—both in direct impact and in the attention they command. But they’re also uniquely rich sources of insight:

They reveal how systems actually behave under stress.
They expose real communication paths, not just formal org charts.
They highlight the gap between process-on-paper and process-in-practice.

And yet, teams often:

Rush through retros to “get back to real work.”
Focus narrowly on the technical root cause.
Fail to share insights beyond the directly involved team.

Treating failure as just something to fix wastes a critical asset. Treating failure as something to study and share is what builds long‑term reliability.

The Street Market format is built on a simple premise:

Every incident retrospective produces an artifact that can help prevent or improve future incidents—often for teams far outside the original blast radius.

What Is a Hand‑Drawn Failure Map?

A failure map is a hand‑drawn, low‑fidelity visual representation of how a specific failure unfolded:

What broke (and in what order)?
Who got paged, and when?
Which systems, tools, dashboards, and runbooks were involved?
What decisions were made, and why?
Where were the confusing moments, delays, or surprises?

Instead of a polished architecture diagram or a formal incident report, a failure map is deliberately rough and human:

Drawn on cardboard, paper, or a whiteboard.
Uses boxes, arrows, stick figures, timestamps, speech bubbles.
Highlights emotions and confusion (“we had no idea where the logs were”).
Captures context that rarely makes it into formal docs.

Why hand‑drawn?

Low barrier to entry: Anyone can sketch. You don’t need diagramming tools or design skills.
Invites conversation: People feel freer to question and annotate a sketch than a polished diagram.
Makes the invisible visible: You can literally see the maze of alerts, chats, and decisions that form the real reliability system.

These maps turn complex, abstract reliability issues into something you can hold, point to, and share.

From Retro Artifact to Knowledge Marketplace

Most organizations already do some form of incident retrospective, but the output often remains siloed:

The doc lives in a team’s folder.
Lessons are not easily discoverable.
Other teams repeat the same mistakes.

The Street Market approach reframes each incident as currency in a reliability marketplace.

A team runs an incident retro.
As part of that retro, they create a failure map of what happened.
That map becomes an asset, stored physically (e.g., cardboard, poster) or digitally (photo of a whiteboard).
Periodically, teams come together with their maps for a structured swap session.

In one afternoon, you create a fast, cross‑team knowledge marketplace for:

Failure modes and how they manifested.
Response patterns that worked well (or poorly).
Process gaps, communication breakdowns, and decision bottlenecks.

Instead of hoping people will read long postmortems, you make learning:

Visual
Social
Time‑boxed

How the Cardboard Street Market Works (Step‑By‑Step)

You can run this as a half‑day workshop with 4–6 teams. Aim for 90–120 minutes of shared time.

1. Preparation (Before the Workshop)

Ask each participating team to:

Select 1–2 recent incidents (not necessarily the biggest ones, just representative).
Create a failure map for each incident.
- 20–30 minutes per map.
- Keep it simple: sequence, key players, tools, decisions, pain points.
Bring the maps physically (cardboard, large paper) or print them.

Optionally, provide a simple template:

Timeline across the top.
Systems and components in the middle.
People and communication channels along the bottom.
Pain points highlighted with red markers or sticky notes.

2. Kickoff: Ground Rules and Intent (10–15 minutes)

Set the tone:

Blamelessness: We’re learning from systems and processes, not judging individuals.
Psychological safety: No shaming; questions are for understanding, not for “gotchas.”
Exploration over diagnosis: We’re not here to fix these old incidents; we’re here to learn patterns we can re‑use.

Explain the format briefly so everyone knows what to expect.

3. Market Stalls: Teams Present Their Failure Maps (30–40 minutes)

Set up the room like a small market:

Each team gets a “stall” (a table or wall space) to display their map(s).
Split the group in half:
- Half stay at their stalls as “vendors” (explaining their incident).
- Half become “visitors” (walking around and asking questions).

Give visitors 8–10 minutes per stall, then rotate. Prompts for discussion:

“Where did you first realize something was wrong?”
“What made this incident harder than it needed to be?”
“What surprised you while mapping this out?”
“What do you wish had existed before this incident?”

Visitors annotate maps with sticky notes:

Similar incidents they’ve seen.
Ideas for preventing or shortening this type of failure.
Notable process or communication issues.

Then swap roles: vendors become visitors and vice versa.

4. Tabletop Exercises: Low‑Cost Reliability Simulations (30–40 minutes)

Now you shift from past incidents to hypothetical scenarios inspired by the maps.

Form mixed teams (people from different original teams) and give each group:

One failure map from the earlier session.
A “what if” scenario: a twist or variation on the underlying failure mode.

For example:

“What if this failure happened during a major product launch?”
“What if the primary on‑call were out sick?”
“What if observability tools were degraded too?”

Ask each group to talk through their response as if it were happening now:

Who would they page?
What dashboards or logs would they check first?
How would they communicate with stakeholders?
What decision points would be most stressful or ambiguous?

This is a tabletop exercise: no infrastructure, no chaos engineering, just conversation. The point is to:

Practice mental models for responding to failure.
Surface unclear ownership, missing tools, or brittle processes.
Spot communication gaps before the next real incident.

Have groups capture:

2–3 things that would make this hypothetical incident easier.
1–2 cross‑team improvements (runbooks, shared dashboards, process tweaks).

5. Group Debrief: From Stories to Systemic Improvements (15–20 minutes)

Bring everyone back together. Ask:

What patterns did you see across multiple teams’ failures?
Which failure modes seem to repeat in different guises?
Where did communication and decision‑making slow things down?
What small, realistic changes could we make in the next month?

Capture themes on a shared board:

Tooling gaps (missing alerts, poor dashboards).
Process gaps (no clear incident commander, unclear escalation paths).
Knowledge gaps (runbooks outdated or nonexistent).

Commit to a handful of follow‑up actions—ideally cross‑team ones.

Why This Works: The Hidden Benefits

The Cardboard Reliability Street Market is deliberately low‑cost and low‑stakes, but the benefits compound quickly.

1. Reliability Skills in an Afternoon

Because it’s short and focused, teams:

Practice incident thinking without being on fire.
Learn how other teams detect, triage, and communicate.
Get uncomfortable truths out in the open while everyone is calm.

You’re training reliability as a muscle, not just a reaction.

2. Tangible, Reusable Artifacts

The failure maps themselves become long‑lived assets:

Hang them in team spaces as reminders.
Use them to onboard new engineers.
Digitize them into a searchable catalog of “how things really fail here.”

Months later, a new incident will happen and someone will remember: “This looks a lot like that cardboard map we saw from Team X.”

3. Cross‑Team Empathy and Shared Language

By walking through each other’s incidents, teams:

Understand dependencies and constraints.
Learn shared vocabulary for reliability issues.
Build empathy for the pressures others face during incidents.

This often pays off in future cross‑team incidents: people know who to call, and how to talk about the problem.

4. Surfacing Gaps Before They Hurt You

The tabletop simulations are especially powerful for uncovering:

Broken or unclear escalation paths.
Single‑points‑of‑knowledge (only one person knows how X works).
Overloaded tools or processes that would fail under pressure.

You’re intentionally looking for where the current system will crack next—when there’s still time to change it.

Getting Started: Keep It Scrappy

You don’t need executive sponsorship or a big budget to try this.

You need:

Cardboard or large paper
Markers, sticky notes, tape
4–6 incidents’ worth of experience
2–3 hours of protected time

Start small:

Pilot with two or three teams.
Time‑box aggressively; imperfect is fine.
Gather feedback and iterate on the format.
Make it a recurring event (quarterly works well).

Over time, you’ll build a living library of how your systems fail and how your people respond—captured not just in docs, but in shared experience.

Conclusion: Turning Failure into a Shared Asset

Failures are already happening. Incidents are already eating your time. The only real question is whether you’re extracting the full value from them.

By turning retrospectives into hand‑drawn failure maps, and those maps into a Cardboard Reliability Street Market, you:

Treat failure as a powerful learning tool, not just a nuisance.
Transform isolated incidents into reusable learning artifacts.
Build a fast, cross‑team knowledge marketplace in an afternoon.
Use low‑stakes, tabletop‑style exercises to probe your processes before the next crisis hits.

You can’t eliminate failure—but you can make sure every failure you’ve already paid for keeps teaching you, long after the incident is resolved.

Rain Lag

The Cardboard Reliability Street Market: Swapping Hand‑Drawn Failure Maps in One Afternoon