The Reliability Cardboard City: Building a Tabletop Twin of Your Production Universe

The Reliability Cardboard City: Building a Tabletop Twin of Your Production Universe With Scissors and Tape

When your production environment spans dozens of services, clouds, vendors, and teams, a single architecture diagram rarely tells the whole story. Tools are great at drawing boxes and arrows, but they struggle to capture how humans, processes, and technologies actually behave together under stress.

That’s where a "reliability cardboard city" comes in: a physical, low‑fidelity model of your production universe, built out of cardboard, Post‑its, markers, and tape. It’s a playful idea with serious value—especially for secure design, threat modeling, and incident response.

In this post, we’ll explore how to build a tabletop twin of your systems, why it’s so effective, and how to combine it with NIST‑aligned practices to strengthen resilience.

Why Build a Tabletop Twin at All?

Most teams already have:

Architecture diagrams in Confluence or Lucidchart
Cloud provider topology views
Incident runbooks in wikis or ticketing tools

So why bother with cardboard and tape?

Because physical, low‑fidelity models excel at things digital diagrams are bad at:

Shared understanding: When everything is on the table (literally), it’s much easier for the whole room to see how components relate.
Conversation over precision: Cardboard doesn’t pretend to be exact. That lowers the barrier for people to ask questions, challenge assumptions, and admit they don’t understand something.
Systems thinking: You can walk around it, move pieces, and simulate flows. People start thinking in terms of end‑to‑end behavior, not isolated components.
Engagement: Craft materials are disarming and fun. That matters when you want people to actively participate in reliability and security work instead of tuning out.

A tabletop twin is not a replacement for your diagrams and tools. It’s a complement that makes hidden complexity visible and discussable.

What Is a Reliability Cardboard City?

Think of your production environment as a city:

Services become buildings
Networks and data flows become roads and bridges
External dependencies become neighboring towns
Users and clients become citizens, vehicles, or districts

Your reliability cardboard city is a scaled‑down, abstract physical representation of that world. You’re not modeling every server; you’re modeling key components, relationships, and failure paths.

Typical building blocks might include:

Cardboard boxes for major services (API, auth, payments, data pipeline)
Index cards or Post‑its for smaller components, queues, or jobs
Colored tape or string for network segments, trust boundaries, and data flows
Tokens or figurines for user types, attackers, on‑call roles, or vendors
Colored markers to denote security zones, severity levels, or data classifications

The goal: a tangible, manipulable representation of your system that you can:

Walk through like a story
Attack like an adversary
Break like a chaos engineer
Fix like an incident commander

Using Cardboard to Improve Secure Design

Secure design often lives inside static diagrams and long threat modeling documents. A cardboard city brings it into 3D.

Map Your Architecture and Attack Surfaces

Start by building the core:

Place your critical services as labeled boxes.
Mark trust boundaries with different colored tape (e.g., public internet, DMZ, internal, highly restricted).
Draw or string data flows between components.

Now ask security‑oriented questions as a group:

Where does external traffic first enter the city?
Which “buildings” hold sensitive data?
What are the authentication and authorization gates?
Where are your third‑party integrations and what can they reach?

Use colored stickers or pins to mark attack surfaces:

Red dot: publicly exposed endpoints
Yellow dot: internally exposed but sensitive components
Blue dot: admin interfaces or high‑privilege pathways

Suddenly, your attack surface isn’t an abstract concept—it’s a cluster of red stickers on certain buildings and roads.

Explore Propagation of Failures and Threats

With everything visible, you can physically trace how an attack or failure might propagate:

"If this API is compromised, what buildings can it access?"
"If this database is locked (ransomware or misconfig), which business capabilities vanish?"

Move a token representing an attacker along the roads, following the same routes data or credentials would. This often surfaces implicit trust, missing controls, and over‑permissive paths that documents fail to highlight.

Cardboard City for Scenario‑Based Threat Modeling

Traditional threat modeling workshops can feel dry and overly theoretical. A cardboard city turns them into scenario‑based tabletop exercises:

Define a scenario: e.g., “Credential stuffing attack on our login endpoint” or “Insider abuses admin panel access.”
Place the attacker token at the appropriate entry point.
As a team, narrate each step:
- What can the attacker see?
- What systems do they hit next?
- Which controls fire (or fail)?
Use physical markers to represent:
- Alerts raised
- Controls blocking actions
- Lateral movement attempts

Because the city is simple and tactile, even non‑security experts can contribute:

Product managers can identify business impacts.
Ops engineers can point out operational realities (latency, maintenance windows, manual steps).
Developers can explain implementation quirks or shortcuts.

The result is more inclusive, realistic threat models that reflect how your system and organization actually behave.

Mapping Incident Response With a Tabletop Twin

Incident response often relies on scattered mental models: each engineer knows a slice of the system, but no one sees the whole picture under pressure.

A reliability cardboard city gives you a shared map for incidents.

Physically Map Severity and Blast Radius

For a chosen incident scenario (e.g., "Payments API latency spike"):

Use colored cards or rings to mark impacted components:
- Red: currently degraded or down
- Orange: at risk / degraded dependencies
- Green: healthy but critical to the response
Add small flags indicating severity levels (SEV‑1, SEV‑2, etc.).
Trace user impact by placing user tokens where functionality breaks.

As you do this, patterns emerge:

Single points of failure become obvious hubs.
“Minor” components are revealed as critical connectors.
You see where you need better isolation or graceful degradation.

Visualizing Escalation Paths

Next, add the humans:

Tokens for on‑call roles (SRE, app engineer, security, incident commander).
Paths representing escalation flows (who gets paged when, who calls whom, when leadership is engaged).

When people can see their role in the city, you can ask:

Who is overwhelmed in major incidents?
Where do we have single points of human failure (one person who knows a subsystem)?
Are there escalation loops or dead‑ends?

This turns abstract response plans into walkable, improvable workflows.

Surfacing Hidden Dependencies and Constraints

One of the biggest benefits of analogue modeling is cross‑functional collaboration.

In a room with engineers, security, ops, support, and product all gathered around the city, you’ll often hear:

“Wait, that service depends on that queue? That’s not documented.”
“Legal requires a manual check before we can do that rollback.”
“The vendor’s SLA means this building should be in the ‘unreliable district.’”

By physically placing those dependencies and constraints:

Undocumented interactions become visible, and can be added to formal diagrams later.
Operational realities (manual approvals, vendor SLAs, data residency rules) can be modeled as barriers, gates, or special districts.

The city becomes a bridge between technical and non‑technical perspectives, creating alignment that’s hard to achieve through tickets and PDFs.

Rehearsing Incidents and Attacks: Iterative Testing of Playbooks

A cardboard city is an ideal environment for low‑risk practice:

Choose a realistic incident scenario (outage, data exfiltration, DDoS, etc.).
Follow your existing incident response playbook step by step.
Move tokens and markers as if the incident is unfolding in the city.
Time the response, note decision points, and write down friction.

You’ll quickly spot:

Steps that assume knowledge only one person has.
Missing communications (no one informed customer support).
Tools or dashboards that don’t exist or aren’t easily accessible.

Use these rehearsals to iteratively refine playbooks. Next session, re‑run the scenario with the updated procedures and see what improves.

Combining Cardboard City With NIST‑Aligned Practices

NIST’s incident response lifecycle includes:

Preparation
Detection & Analysis
Containment, Eradication & Recovery
Post‑Incident Activity

Your reliability cardboard city can anchor each phase:

Preparation: Build and maintain the city; map roles, assets, and dependencies. Use it to design and test your incident response plan.
Detection & Analysis: In exercises, show where monitoring and logging live in the city and how alerts propagate to humans.
Containment, Eradication & Recovery: Physically model isolation strategies (cut a “road,” close a “bridge”), failovers, and rollback paths.
Post‑Incident Activity: During postmortems, reconstruct the incident sequence in the city. Mark what actually happened vs. what you thought would happen.

This combination yields a structured yet creative framework: NIST provides the rigor and process, while the cardboard city provides shared context and engagement.

How to Get Started (Without Overthinking It)

You don’t need perfection. You need a table, some cardboard, and an hour.

Pick a scope: One critical product journey or one major subsystem.
Gather materials: Cardboard, tape, markers, Post‑its, string, tokens.
Invite a cross‑functional group: Engineers, ops, security, product, support.
Build version 0.1: Rough buildings for core services, approximate data flows, a few trust boundaries.
Run a single scenario: A simple outage or attack path.
Capture insights: Take photos, write down surprises, log follow‑ups.

Iterate from there. Add more detail only as it proves useful.

Conclusion: Serious Resilience, Playful Tools

Reliability and security are serious disciplines, but the tools we use to improve them don’t have to be intimidating. A reliability cardboard city turns complex, distributed systems into something you can see, touch, and walk around.

By building a tabletop twin of your production universe, you:

Make architecture, risks, and dependencies tangible.
Enable engaging, accessible threat modeling across roles.
Improve secure design and incident response planning.
Practice and refine NIST‑aligned resilience strategies before real crises.

With nothing more than scissors, tape, and a willingness to experiment, you can help your team see your systems—and their vulnerabilities—in a whole new way.