Rain Lag

The Clockwork Garden: Growing Failure-Resistant Systems With Analog Incident Stories

How a hand-tuned paper ecosystem, tactile tools, and a “software gardening” mindset can transform incident analysis into a calmer, more insightful way to build failure-resistant systems.

Introduction: When Postmortems Feel Like Autopsies on a Moving Patient

Most teams only really look at their systems when something breaks. The ritual has many names—postmortem, incident review, root-cause analysis—but it often shares the same problems: stressful meetings, rushed timelines, digital tools that feel cold and abstract, and a narrow hunt for a single “root cause.”

The “Analog Incident Story Clockwork Garden” proposes a very different approach. Drawing from safety and reliability engineering, it treats software and systems less like finished machines and more like living ecosystems that must be tended over time.

Instead of starting with logs and dashboards, it starts with paper, pens, simple props, and human senses. It invites teams to map incidents as stories, explore chains of events, and gently surface systemic vulnerabilities. The result is a style of incident analysis that is calmer, more collaborative, and surprisingly powerful.

This post walks through the key ideas behind the Clockwork Garden and the connected Software Gardening Almanack—and how analog tools can help teams grow failure-resistant systems.


From Root Cause to Event Chains and Ecosystems

Traditional incident analysis often asks: “What was the root cause?” It assumes a mostly stable machine that occasionally breaks in a single, identifiable spot. Safety and reliability engineering, especially in complex domains like aviation and healthcare, paints a more nuanced picture:

  • Failures rarely have a single cause. They emerge from chains of events, interactions between components, humans, tools, and environments.
  • Local fixes can hide system-level vulnerabilities. Patching one bug may leave the underlying pattern of risk untouched.
  • Context matters. The same component can be safe in one configuration and dangerous in another.

The Analog Incident Story Clockwork Garden borrows this mindset. It emphasizes:

  • Event chains: Looking at how incidents unfold over time, not just where they ended.
  • Failure modes: Asking “In what ways can this fail?” instead of “Did this component work?”
  • System-level vulnerabilities: Spotting recurring structures—like brittle dependencies, overworked humans, or poorly understood components—that keep reappearing across incidents.

By telling incidents as stories—with characters, settings, and turning points—teams can see beyond isolated defects and start recognizing their system as a dynamic, evolving ecosystem.


The Software Gardening Almanack: Tending Systems, Not Shipping Artifacts

Many software projects are built and then treated as if they were finished objects: ship the release, close the ticket, move on. The Software Gardening Almanack pushes back on that mentality by changing the metaphor.

Instead of constructing a product once, it suggests that teams garden their software:

  • Plants = components and services that grow, decay, and interact.
  • Soil = infrastructure and organizational culture that shapes what can thrive.
  • Weather = external factors—users, regulations, hardware changes, market shifts.

From this perspective, reliability isn’t achieved by a single heroic fix. It’s a continuous practice:

  • You prune dead code and unused features.
  • You fertilize key components with tests, documentation, and observability.
  • You weed out design patterns that keep leading to incidents.
  • You rotate crops by refactoring and gradually replacing aging or brittle modules.

This gardening metaphor becomes especially powerful in scientific software, where sustainability and reproducibility are chronic challenges. Research code often lives for years, evolving through multiple generations of students and collaborators. The Almanack encourages teams to:

  • Document assumptions as if leaving notes for the “next gardener.”
  • Design for resilience (graceful degradation, clear failure modes) rather than fragile success.
  • Build reproducible environments where experiments can be re-run and validated.

Used together, the Clockwork Garden and the Almanack frame software as living systems that must be observed, nurtured, and occasionally reshaped. Incidents are not embarrassments to hide, but growth opportunities for the ecosystem.


Why Analog? The Power of a Paper Ecosystem

In an age of dashboards, AI incident assistants, and sprawling wiki tools, choosing paper and simple physical artifacts can feel odd. But the Analog Incident Story Clockwork Garden is built on purposefully low-tech tools because they:

  1. Slow things down to human speed.

    • Writing on cards or sticky notes forces participants to think and phrase events clearly.
    • It gives everyone time to process, not just the most vocal or the fastest typist.
  2. Make structure visible and tangible.

    • Event cards can be laid out on a table or wall, rearranged into timelines, grouped into clusters.
    • Dependencies, delays, or missing information literally show up as gaps or tangled lines.
  3. Support multiple senses.

    • Visual (cards, diagrams, colors), tactile (handling pieces, moving them), and even auditory cues (reading events aloud, using small chimes or tokens) all engage different modes of reasoning.
  4. Encourage participation across roles and expertise levels.

    • A printed timeline or hand-drawn map is less intimidating than a dense metrics screen.
    • Non-engineers, new team members, and subject-matter experts can all contribute.

This paper ecosystem doesn’t replace digital tooling. Logs, traces, and dashboards are still vital. But the analog layer acts as a bridge between raw data and human understanding, helping teams:

  • Spot patterns across multiple incidents.
  • Rehearse “what if” scenarios by rearranging event sequences.
  • See how organizational decisions (staffing, schedules, policies) intersect with technical failure modes.

Sensory Tools for Collaborative Incident Exploration

A key insight behind the Clockwork Garden is that tools shape conversations. If your only tools are spreadsheets and error graphs, your discussion stays narrow and analytic. Analog, sensory tools change the tone and depth of the conversation.

Cost-effective, durable, and easy-to-use artifacts might include:

  • Event cards: One event per card, written in simple language ("Alert fired", "On-call acknowledged", "Patch deployed"). These become the basic building blocks of incident stories.
  • Colored markers or tokens: Different colors for people, processes, technical components, or environments. Quickly highlight where attention clusters.
  • Thread or string: Physically connect related events, showing dependency chains or information flows.
  • Timelines on butcher paper: Long sheets on the wall where teams place and rearrange events over time.
  • Auditory cues: Soft bells or clicks to mark key transitions when building the story aloud, reinforcing sequencing and turning points.

These tools are intentionally simple and reusable. Teams don’t need specialized training, and they can be packed into portable kits for use in different rooms or even different organizations.

The result is a shared sense-making space where engineers, operators, researchers, and managers can:

  • Co-construct the incident narrative.
  • Ask clarifying questions as they see gaps appear.
  • Propose alternative sequences (“What if we had noticed this earlier?”).

This collaborative exploration tends to shift focus away from blame and toward system improvement.


Therapy-Inspired Design: Calming the Room to Improve Insight

Incident analysis is often emotionally charged: people feel guilty, defensive, or anxious about reputation and deadlines. The Clockwork Garden intentionally borrows ideas from therapeutic and trauma-informed practices to make the process safer and more productive.

Physically engaging, therapy-inspired tools can:

  • Create a calming ritual.

    • Starting sessions by slowly building the initial timeline or reading events out loud sets a reflective tone.
    • Hands-on actions (placing cards, tracing lines) ground participants in the present.
  • Reduce cognitive overload.

    • Externalizing thoughts onto paper frees working memory.
    • The physical layout acts as a "second brain" for the group.
  • Encourage equal voice.

    • Pass-the-token speaking turns, or inviting each person to place one card at a time, prevent a few voices from dominating.
  • Normalize failure as data.

    • Gentle, nonjudgmental language on cards (“Attempted X”, “Observed Y”) avoids harsh labeling.
    • The focus becomes understanding the conditions under which the system failed, not judging individuals.

When people feel safer and less rushed, they can be more honest about near-misses, confusing interfaces, or workarounds they rely on. That honesty is crucial for uncovering the deep structures of risk that digital dashboards alone can’t reveal.


Putting the Clockwork Garden Into Practice

Adapting these ideas to your own team doesn’t require a full research implementation. You can start small:

  1. Pick an incident that matters but isn’t too raw or politically charged.
  2. Gather your analog kit: index cards, markers, large paper, tape, and some simple tokens.
  3. Invite a cross-functional group: not just engineers, but also operations, support, domain experts.
  4. Tell the story in events:
    • Write one event per card.
    • Lay them out on a timeline.
    • Mark where people, processes, and tools interacted.
  5. Look for patterns and failure modes:
    • Where were people overloaded?
    • Where did signals get lost or ignored?
    • What dependencies turned out to be brittle?
  6. Frame improvements as gardening tasks:
    • What do we need to prune, fertilize, weed, or replant?
    • What ongoing "maintenance" will reduce the chance of similar incidents?

Document what you learn in your usual digital tools—but keep the analog session as a recurring practice. Over time, you’ll accumulate not just fixes, but a richer understanding of your system’s ecosystem.


Conclusion: Growing Failure-Resistant Systems, One Story at a Time

The Analog Incident Story Clockwork Garden and the Software Gardening Almanack invite us to rethink how we build and maintain complex software and scientific systems.

  • Incidents are not isolated glitches; they are stories that reveal how our socio-technical ecosystems behave under stress.
  • Analog, sensory tools help humans see and feel patterns that digital tools can obscure.
  • Treating software as a garden—to be nurtured, pruned, and continuously reshaped—leads to more sustainable, resilient, and reproducible systems.
  • Therapy-inspired, physically engaging practices make incident analysis calmer and more inclusive, improving both psychological safety and technical insight.

In a world obsessed with automation and optimization, there is surprising power in stepping back, picking up a pen, and laying a few cards on the table. By cultivating a hand-tuned paper ecosystem around our incidents, we can grow systems that don’t just work most of the time—they fail more gracefully, recover more quickly, and evolve more wisely.

And like any good garden, the work is never really done. That’s the point.

The Clockwork Garden: Growing Failure-Resistant Systems With Analog Incident Stories | Rain Lag