The Analog Architecture Sandbox: Modeling Distributed Systems With Paper Tokens Before You Touch Kubernetes
How to use tabletop, paper-token exercises to model distributed systems, explore microservice designs, and expose failure modes before you commit to Kubernetes and complex infrastructure.
The Analog Architecture Sandbox: Modeling Distributed Systems With Paper Tokens Before You Touch Kubernetes
When teams decide to move from a monolith to microservices or start planning their first Kubernetes deployment, they often jump straight into YAML, Helm charts, and cloud consoles. But there’s a simpler, cheaper, and far more human way to explore distributed architectures first:
Build them on a table with paper tokens.
This “analog architecture sandbox” turns your systems into a physical game board. You use tokens to represent services, queues, topics, and users, then play through realistic scenarios: traffic spikes, outages, partial failures, and incident response.
The result is a shared, visual understanding of how the system behaves before you invest in infrastructure.
Why Model Distributed Systems With Paper?
Distributed systems are hard because:
- Behavior is emergent, not obvious.
- Failures are partial and messy, not binary.
- Communication paths (both between services and humans) are complex.
Diagramming tools and architecture docs help, but they’re static. A tabletop exercise is dynamic and collaborative:
- Low cost, high fidelity learning – You can try wild ideas and throw them away in minutes.
- Shared mental model – Engineers, SREs, product managers, and even non-technical stakeholders can all see what’s going on.
- Safe failure lab – You can “break” the system repeatedly with no risk other than running out of sticky notes.
- Pre-Kubernetes sanity check – You validate whether you really need complex infrastructure, and if so, what kind.
Think of it like unit testing your architecture, using paper instead of code.
The Core Idea: Physical Tokens For Logical Patterns
Start by mapping common distributed patterns to physical objects:
- Clients / Users → Stick figures or colored tokens
- Services / Components → Sticky notes or index cards (one component per card)
- Databases / Data stores → Larger cards or special-colored sticky notes
- Queues (e.g., SQS, RabbitMQ) → Rectangles with an arrow and a small “inbox” area for request tokens
- Topics / Pub-Sub (e.g., Kafka) → Circular or central cards with arrows to multiple consumers
- Requests / Messages → Small paper slips or coins that you physically move
- External dependencies (payment gateways, third-party APIs) → Cards on the edge of the table
You’re not trying to be pixel-perfect. The goal is to make patterns visible:
- Is this synchronous or asynchronous?
- Who talks to whom?
- What happens when something is slow, not just down?
Step 1: Start With The Monolith On The Table
Even if your goal is microservices and Kubernetes, begin with your current or simplest possible architecture. For many teams, that’s a monolith or a basic client–server setup.
-
Draw the monolith
- Place a big sticky note labeled
Appin the center. - Add a
DBcard nearby. - Put several
Usertokens at the edge of the table.
- Place a big sticky note labeled
-
Simulate a basic request flow
- Pick up a
Requesttoken from a user and move it to theApp. - From
App, move it toDB, then back toApp, then back toUser. - Narrate out loud: "User logs in, app checks DB, returns result."
- Pick up a
-
Annotate performance and limits
- Add small notes:
App: ~500 RPS,DB: ~200 writes/sec, etc. - Mark where caching happens, if at all.
- Add small notes:
Everyone in the room should now understand the baseline system.
Step 2: Iteratively “Split” Components Into Microservices
Once the monolith is clear, start exploring microservice candidates.
-
Identify natural seams
- Look for areas with distinct teams, business domains, or scaling profiles:
Auth,Payments,Catalog,Notifications, etc.
- Look for areas with distinct teams, business domains, or scaling profiles:
-
Split the monolith physically
- Replace parts of the
Appcard with separate service cards:Auth Service,Orders Service,Inventory Service. - Redraw the flows: move
Requesttokens fromUser→Gateway→ specific services.
- Replace parts of the
-
Choose interaction styles
- For synchronous calls: draw direct arrows between service cards.
- For async patterns: introduce
QueueorTopiccards and moveMessagetokens through them.
-
Ask design questions in real time
- Should this be a synchronous HTTP call or an event on a topic?
- If
Inventory Serviceis down, shouldOrders Servicefail fast, retry, or queue requests? - What data is copied vs owned by each service?
By physically rearranging and splitting cards, you’re effectively doing architecture refactoring without touching code or Kubernetes manifests.
Step 3: Walk Through Realistic Scenarios As A Team
This is where the analog sandbox becomes powerful. Treat the setup like a tabletop incident-response exercise.
Scenario A: Traffic Spike
- Double or triple the
Usertokens. - Move
Requesttokens rapidly through the system. - Watch what piles up:
- Are tokens stuck at
Gateway? AtDB? In aQueue?
- Are tokens stuck at
- Add notes where you’d apply:
- Auto-scaling (Pods, HPA)
- Caching (CDN, in-memory cache)
- Rate limiting or backpressure
Scenario B: Partial Outage
- Choose one service card (e.g.,
Payments Service) and flip it over: "DOWN". - Continue moving
Requesttokens as if users are still using the system. - Answer as a group:
- What degrades? What still works?
- Do we show a degraded UI or a hard error?
- Where do failed requests go—lost, retried later, queued?
Scenario C: Slow Dependency
- Mark
External APIwith "+2 seconds latency". - For each request that touches it, force a short pause before you move tokens on.
- Observe:
- Do requests back up at the calling service?
- Does this slow down everything or just certain flows?
- Where would circuit breakers or timeouts live?
These scenarios make invisible behaviors tangible. Everyone sees the queues form, the bottlenecks, and the cascading failures.
Step 4: Focus On Roles, Responsibilities, And Communication
Don’t let the session become purely technical. Treat it like a real incident-response tabletop.
-
Add people to the board
- Represent teams or roles (
Backend Team,SRE,On‑Call,Product Owner) with tokens. - Draw lines or arrows to the services they own.
- Represent teams or roles (
-
Ask organizational questions
- When
Payments Serviceis down, who is paged? - Who decides to degrade features vs fully shut down a path?
- Who has the authority to change configuration in an emergency?
- When
-
Simulate incident communication
- As you walk through an outage, pause and ask:
- Who talks to customers?
- Who coordinates across teams?
- Where do runbooks live?
- As you walk through an outage, pause and ask:
Often, the exercise exposes not just technical single points of failure, but ownership and communication gaps.
Step 5: Surface Bottlenecks And Single Points Of Failure
As patterns emerge, annotate the board with risks:
-
Bottlenecks
- Components that accumulate a lot of tokens under load.
- Services that everything flows through (e.g.,
Gateway,DB).
-
Single points of failure
- Cards with no redundancy or fallback paths.
- Critical external dependencies without graceful degradation.
-
Ambiguous ownership
- Services with no clear team token attached.
- Shared databases used by multiple services without clear contract.
Capture these findings in a list:
- "
Orders Servicedepends synchronously onInventoryandPricing— risk of cascading failure." - "Single write-heavy DB for all services — scaling and contention risk."
- "No defined owner for
Notification Service—unclear incident path."
This risk list will become input to your actual Kubernetes and infrastructure design.
Step 6: Align With The C4 Model For Better Diagrams Later
To avoid your insights dying on sticky notes, map what you’ve built to a more formal diagramming approach such as the C4 model:
-
Level 1 (System Context)
- Your entire table as one system plus external users and dependencies.
-
Level 2 (Containers)
- Each card representing a deployable component (service, DB, queue) is a C4 container.
-
Level 3 (Components)
- If you broke a service card into sub-cards (e.g.,
API,Worker), those represent components.
- If you broke a service card into sub-cards (e.g.,
When the session finishes:
- Take photos of the table from multiple angles.
- Translate the layout into a C4 diagram using your favorite tool.
- Annotate trust boundaries, protocols (HTTP, gRPC, events), and deployment environments.
Now your analog sandbox has a direct path to architecture diagrams and implementation plans that developers and platform engineers can execute on.
When To Run An Analog Architecture Sandbox
Use this technique when:
- You’re considering a monolith-to-microservices migration.
- You’re planning your first Kubernetes or service-mesh rollout.
- You’re designing a new product with distributed components.
- You’ve had a painful incident and want to understand systemic behavior.
Sessions can be short and focused:
- 60–90 minutes for a single scenario and architecture slice.
- Half-day workshops for more complex landscapes.
Invite a cross-functional group: engineers, SREs, architects, product, and if relevant, support or operations.
Conclusion: Design In Paper Before You Design In YAML
Kubernetes, service meshes, and cloud-native tooling are powerful—but they can also lock in complexity and amplify design mistakes.
A simple tabletop exercise with paper tokens lets you:
- Visualize and interrogate distributed patterns.
- Explore microservice boundaries safely.
- Practice incident response and communication flows.
- Expose bottlenecks, ownership gaps, and single points of failure.
- Feed directly into structured diagrams like C4 and, eventually, into clean Kubernetes manifests.
Before you spin up a cluster or write your first Helm chart, clear a table, grab some sticky notes, and build your architecture where everyone can see it. The cheapest, fastest place to break your system is on paper.