The Analog Incident Story Card Catalog: Filing Failures Into a System You Can Actually Browse
How to turn production incidents and failures into a browsable, physical card catalog that your team actually uses to learn, spot patterns, and improve systems over time.
Introduction
Most teams say they want to learn from failure. They run incident reviews, write post‑mortems, and share links. Then those write‑ups disappear into Confluence, Google Drive, or some forgotten incident tool.
You can’t learn from what you can’t see.
This is where an intentionally low‑tech idea becomes surprisingly powerful: build an analog incident story card catalog — a physical, library‑style card catalog filled with short, structured incident summaries. Each card is a compact story about a failure, filed into a consistent classification system you can flip through with your hands.
This isn’t nostalgia for paper. It’s a way to make failure visible, browsable, and impossible to quietly ignore.
In this post, we’ll walk through:
- What an incident story card catalog is
- How to design the cards as mini post‑mortems
- How to organize and file them using a taxonomy
- How to borrow structure from frameworks like HFACS‑Healthcare
- How to keep the catalog alive, used, and connected to change
What Is an Incident Story Card Catalog?
Think of a library card catalog, but every card represents an incident:
- A production outage
- A serious near‑miss in healthcare
- A security breach attempt
- A failed deployment
Each card is a compact narrative of what happened, why it happened, what the impact was, and what was learned. Cards are filed in drawers using a taxonomy that reflects your system: technical failures, human factors, process gaps, organizational issues, and so on.
You’re building a tangible archive of system memory.
Why analog?
- Friction in the right place: Creating a card forces summarization and reflection.
- Zero context switching: No logins, no search strings, just open a drawer and browse.
- Psychological weight: Physically seeing “a drawer full of auth outages” carries a different emotional impact than a filtered online list.
- Shared experience: People gather around drawers; serendipitous “oh wow, remember this one?” conversations happen.
The point is not to replace digital records, but to curate them into a format that invites revisiting.
Designing Each Card as a Mini Post‑Mortem
Each card should read like a short story, not a sterile ticket. Aim for clarity and learning, not legalese or blame.
A practical card layout might include:
Front of the card
- Title – A human‑readable name: “The Tuesday Morning Auth Meltdown”
- Date – When the incident occurred
- Systems / Domains – e.g., Auth, Payments, ICU, Lab Orders
- Impact summary – 1–2 sentences in plain language:
- “Users could not log in for 37 minutes; ~12k failed login attempts.”
- “Lab results for 23 patients were delayed by ~3 hours.”
- Primary category tag – From your taxonomy (e.g., Human Factors → Attention Management)
Back of the card
- What happened (narrative) – 3–5 sentences:
- Key timeline
- How it surfaced
- How it ended
- Why it happened (contributing factors) – bullets tied to your taxonomy
- What we learned – 3–5 concrete insights
- Follow‑up actions – the 2–3 key changes made
- Reference – link or ID of full digital post‑mortem
Keep the tone matter‑of‑fact and humane. Focus on:
- Conditions under which people made decisions
- System constraints and design choices
- Gaps in tooling, training, or process
Not:
- “Alice forgot…”
- “Bob misconfigured…”
If an action is relevant, describe it in context: “The on‑call engineer rotated earlier than usual and had incomplete handoff notes…”
Focusing on Learning, Not Blame
The entire purpose of the catalog is to support learning and improvement. That means:
- No witch hunts – Names only appear when necessary to understand roles, not to assign fault.
- Systemic perspective – Ask, “What conditions made this outcome likely?” instead of “Who screwed up?”
- Normalize fallibility – Incidents are treated as expected signals from a complex system, not aberrations caused by “bad actors.”
You can encode this in practice by:
- Including a “System Contribution” section on each card:
- e.g., “Pager volume doubled that week due to other known issues.”
- Avoiding single‑cause stories: require at least two contributing factors per card.
- Regularly asking in review sessions: “If a different person were in the same situation, could this still have happened?”
Over time, the drawers themselves become a visible argument: it’s the system, over and over, in different ways.
Building a Taxonomy You Can File and Browse
To make the catalog browsable and pattern‑friendly, you need a consistent taxonomy. It doesn’t have to be perfect; it just has to be stable enough that people can find things.
A simple top‑level structure could be:
- Technical Factors
- Infrastructure / capacity
- Software defects
- Integration / dependencies
- Monitoring / alerting gaps
- Human Factors
- Attention / workload
- Communication / handoffs
- Training / experience
- Interface usability
- Process & Policy Gaps
- Missing or outdated runbooks
- Ambiguous ownership
- Approval / change control issues
- Incomplete procedures
- Organizational Factors
- Staffing / coverage
- Conflicting goals or incentives
- Cultural norms (e.g., heroics, silos)
You can mirror this in your physical setup:
- Drawers or sections for each top‑level category
- Divider cards for subcategories
- Alphabetical or chronological sorting within subcategories
Each incident card may touch multiple categories. To handle that physically:
- File the card under its primary category, and
- Add small colored dots or stickers for secondary categories (e.g., blue = human factors, red = technical).
This allows pattern spotting by simply looking at the colors in a drawer.
Borrowing from HFACS‑Healthcare and Similar Frameworks
You don’t need to invent your classification scheme from scratch. Structured frameworks like HFACS‑Healthcare (Human Factors Analysis and Classification System) offer a robust way to categorize contributing factors.
HFACS typically breaks factors into layers such as:
- Unsafe acts (errors, violations)
- Preconditions for unsafe acts (fatigue, communication, environment)
- Unsafe supervision
- Organizational influences (culture, resource management)
For a software or healthcare environment, you can adapt this by:
- Mapping “unsafe acts” to frontline actions and design decisions
- Mapping “preconditions” to tooling, workload, environment, interfaces
- Mapping “unsafe supervision” to on‑call structures, review practices, leadership decisions
- Mapping “organizational influences” to culture, policies, funding, priorities
On each card, add a small section:
HFACS layer(s): Preconditions, Organizational Influences
This achieves two things:
- It keeps your analysis repeatable and structured over time.
- It surfaces patterns like, “Wow, 60% of our incidents involve organizational influences we never talk about.”
You can borrow from other frameworks as well (STAMP, SEIPS, etc.), but keep the physical card fields simple enough to complete in 5–10 minutes.
Making the Catalog Inviting to Browse
The catalog only works if people actually use it. Design it to be physically and socially inviting:
- Place it somewhere central and visible: near a team space, break room, or incident review area.
- Make the drawers or boxes aesthetically pleasing and clearly labeled.
- Use easily legible handwriting or printed labels.
- Color‑code categories for at‑a‑glance understanding.
Then bake browsing into routines:
- Weekly “Failure Flip‑Through”: 10 minutes at the end of a stand‑up; someone randomly pulls a card and tells the story.
- Pre‑deployment reviews: Before a risky change, scan cards related to that system or category.
- Onboarding: New hires spend an hour browsing and picking 3 incidents to discuss with their mentor.
The goal is to make the catalog feel less like an archive and more like a story library the team is proud of.
Turning Insights into Real Change
A beautiful catalog is useless unless it feeds back into how you work.
Connect the drawers to real decisions by:
- Periodic pattern reviews (monthly or quarterly):
- Count how many cards land in each category.
- Look for clusters: “We’ve had 7 handoff‑related incidents in 3 months.”
- Summarize 2–3 systemic themes.
- Linking to work tracking:
- For each theme, spin up concrete improvements: runbook updates, training sessions, design reviews, process changes.
- Annotate the cards with a small mark when follow‑ups ship.
- Feeding training and runbooks:
- Use real incidents to create scenario‑based training.
- Embed “lessons from card X” directly into runbooks and design standards.
Over time, you’ll start to see before/after comparisons:
- Fewer cards in a specific subcategory after new training or tooling.
- Shorter impact durations as on‑call practices improve.
That’s when the catalog stops being a novelty and becomes a core learning asset.
Conclusion
An analog incident story card catalog seems almost comically simple in a world of dashboards, AI, and real‑time analytics. But its power comes from exactly what digital tools often lack: tangibility, narrative, and shared attention.
By turning each incident into a compact, humane story and filing it into a structured, browsable system, you:
- Keep failures visible, not buried
- Emphasize root causes and systemic factors over blame
- Make it easy to spot recurring patterns
- Ground training, design, and process changes in real history
You don’t need permission to start. Grab index cards, define a lightweight taxonomy, pick a drawer or box, and write the first three incident stories. Place them somewhere people can’t help but notice.
Then, one card at a time, you’ll turn scattered failures into a living library of how your system actually works — and how it can get better.