The Cardboard Failure City: Building a Tabletop Model of Your System’s Hidden Neighborhoods of Risk
How a cardboard tabletop ‘city’ of your microservices can reveal hidden risk neighborhoods, change coupling, and failure paths that traditional dependency graphs obscure—and how to use it for powerful incident-response tabletop exercises.
The Cardboard Failure City: Building a Tabletop Model of Your System’s Hidden Neighborhoods of Risk
Modern systems are too big for human heads.
Cloud‑native architectures sprawl across dozens or hundreds of microservices, third‑party APIs, queues, data stores, and background jobs. Your actual dependency graph looks less like a neat diagram and more like a bowl of spaghetti someone dropped on the floor.
You might have service maps, tracing tools, and dependency graphs, but when things break at 3 a.m., you discover there are still “neighborhoods” of your system nobody really understands. Those are your hidden neighborhoods of risk.
What if you could walk through them—literally?
This is where the Cardboard Failure City comes in: a physical tabletop model of your system that turns abstract dependencies into something you can see, point at, and move around.
Why a City? Why Cardboard?
Dependency diagrams and architecture charts have a crucial problem: they don’t scale with human cognition.
- A few services? A graph works fine.
- A few dozen? It’s already noisy.
- A few hundred? The visualization becomes unreadable.
Even with interactive tools, you end up panning and zooming, and only experts can interpret what they’re seeing.
A city metaphor solves this by giving your system structure and spatial meaning:
- Services become buildings (tall ones, wide ones, grouped into districts).
- Dependencies become roads (one‑way streets, main highways, narrow alleys).
- Shared infrastructure becomes public utilities (power stations, water plants, subway lines).
And cardboard matters because it’s:
- Physical – You can gather around it as a group.
- Cheap – No one is afraid to tear it apart or rearrange it.
- Creative – It lowers the threshold for experimentation. This isn’t a “formal diagram” you’re scared to touch; it’s a prototype playground.
Like building a cardboard airplane to test aerodynamic ideas, building a cardboard city is a low‑risk way to experiment with your system’s architecture and failure modes.
Step 1: Mapping Your System to a City
Start simple. You’re not trying to replicate every single implementation detail. You’re trying to make risk and coupling visible.
1. Choose Your Building Units
Use whatever you have:
- Small cardboard boxes
- Folded index cards or sticky notes
- 3D‑printed blocks
Each building represents a service or a key component:
- Microservices
- Databases
- Message queues
- Third‑party APIs (as “out‑of‑town” buildings near the edge)
Label each building clearly with the service name.
2. Lay Out Districts
Group related services into districts:
- “Payments District” – billing, invoicing, transaction processor
- “User District” – authentication, profiles, permissions
- “Content District” – catalog, search, recommendation engine
These districts will later become your neighborhoods of risk.
3. Add the Roads (Dependencies)
Use string, tape, markers, or yarn to represent dependencies:
- Draw arrows or lines on the table to indicate who calls whom.
- Use thicker lines for heavier traffic or critical paths.
- Represent asynchronous communication (e.g., via queues) with a different color.
You now have a basic physical dependency map. But the real value comes when you start to walk through it as if it were a real city.
Step 2: Revealing Hidden Neighborhoods of Risk
Not all districts are equally dangerous. Some are quiet suburbs; others are unstable neighborhoods where outages spread quickly.
Use Change Coupling to Find Risky Districts
Change coupling is about noticing which services tend to change together. Often, these are:
- Services worked on by the same team
- Services sharing data models or schema
- Services with tightly coupled behavior
To model this:
- Look at your last several months of deployments.
- Identify services that frequently appear in the same pull requests, stories, or release notes.
- Cluster these buildings closer together in your cardboard city.
Now you get visual neighborhoods like:
- A cluster of small services around a shared database that always ship together.
- A “backend block” that’s constantly under construction.
These are unstable neighborhoods that deserve:
- Extra resilience patterns (circuit breakers, retries, graceful degradation)
- More targeted testing
- Stricter rollout strategies (canary, feature flags)
Highlight Hot Spots
Use color or symbols to mark risk:
- Red stickers for services with frequent incidents
- Yellow for high traffic or critical services
- Blue for external dependencies you don’t control
Visually, you’ll begin to see:
- Risk corridors – chains of red or yellow buildings along a single critical path
- Single points of failure – a lone service every major road passes through
You’ve just created a risk map that anyone can understand at a glance.
Step 3: Turning It into an Incident-Response Tabletop Exercise
Now turn your city into a disaster simulation board.
Design Failure Scenarios as City Events
Instead of saying “the payments‑service is down,” say:
- “There’s a power outage in the Payments District.”
- “Construction has blocked the main road between Auth and Checkout.”
- “The external map provider just left the city; all those roads are gone.”
For each scenario:
- Physically mark the failure – place a card saying “Down” on a building, cut a road (remove a string), or cover a district with a “disaster” card.
- Ask the team: What breaks next? Which streets are now impassable? Which buildings lose access? What customer journeys are impacted?
- Trace failure propagation by following the roads.
You’re not just asking, “What does this service do?” You’re asking, “What happens to the neighborhood when this block catches fire?”
Practice Response as a Team
Run the exercise like an incident:
- Assign roles: incident commander, comms, subject matter experts.
- Set a scenario: “At 10:07, the Payments District goes dark.”
- Ask teams to:
- Identify impacted flows and customers.
- Decide mitigation strategies (rerouting, feature flags, fallbacks).
- Discuss communication needs: Who needs to know, and how fast?
Because the model is physical and shared, even non‑experts—support, product, leadership—can follow the story and see why certain mitigations take time.
Step 4: Using the City as a Safe Prototyping Space
A cardboard city is intentionally low fidelity. That’s the point.
Like a cardboard aircraft in a wind tunnel, it’s a sandbox for exploring fragile structures without risk.
Use it to ask:
- “What if we removed this building?”
- Could any flows still work? What would we need to change?
- “What if we split this monolith building into three smaller ones?”
- How many new roads appear? Does the neighborhood become more or less fragile?
- “What if this external supplier fails monthly?”
- Can we route around it? Should we add a “backup road” to another provider?
Encourage people to:
- Rearrange districts to reflect desired future architectures.
- Propose new roads (dependencies) and discuss their risk.
- Experiment with resilience patterns (e.g., creating bypass routes, caches, or local fallbacks as new roads and side streets).
Because it’s just cardboard, there’s no fear of being wrong. The goal is learning, not precision.
Step 5: Make It a Team Sport
The biggest benefit of the Cardboard Failure City isn’t the model itself; it’s the conversations it creates.
Co-Build, Don’t Delegate
Avoid having architecture or SRE build this alone. Instead:
- Involve engineers from multiple teams.
- Include product managers, designers, and support if possible.
- Ask each team to place “their” buildings and explain how they connect.
This process will:
- Uncover hidden assumptions (“Wait, I thought we only depended on that read replica.”).
- Reveal knowledge silos (“Only that one person knows what this building really does.”).
- Improve shared mental models across the org.
Keep It Alive
Your city isn’t a one‑off workshop prop; it’s a living model:
- Update it on a regular cadence (e.g., quarterly architecture reviews).
- Add new buildings as new services appear.
- Run new failure scenarios after each major change.
Over time, your team will gain a shared language of geography:
- “We’re deploying a new service in the Checkout District.”
- “This change introduces a new alley that bypasses the Auth main road.”
- “That’s already a fragile neighborhood; let’s tread carefully.”
This shared language makes it easier to talk about risk without needing to open a dozen tabs and dashboards.
Conclusion: See Your System Like a City, Not a Diagram
Large systems fail in ways that are spatial and relational, not just technical. Outages spread like fires through districts and along main roads—rarely in the clean lines of a dependency chart.
By building a Cardboard Failure City, you:
- Turn invisible coupling into visible roads and blocks.
- Use change coupling to find unstable neighborhoods.
- Embed incident‑response tabletop exercises into a tangible map.
- Create a low‑cost prototyping lab for exploring failure and resilience.
- Strengthen shared understanding and communication across teams.
You don’t need perfect fidelity. You need something good enough that people can gather around, point at, and say, “If this burns, what else burns with it?”
Once you’ve walked your system’s streets and alleys together, you’ll never look at your architecture the same way again—and your next real incident will feel a little less like wandering a dark, unfamiliar city with no map.