The Analog Refactor War Game: Running Tabletop Scenarios Before You Touch a Fragile Codebase

Refactoring a fragile legacy codebase can feel less like gardening and more like defusing an unexploded bomb. One wrong move and… production outage, security incident, lost data, ruined weekend.

Instead of diving straight into the code, imagine you ran a tabletop-style war game first. No IDE. No commits. Just diagrams, index cards, and a cross-functional group walking through “what if?” scenarios:

What if this service starts timing out?
What if this field is null but the database assumes it isn’t?
What if a minor template change opens up an XSS hole?

This analog "war game" approach lets you explore risks, map dependencies, and design your strategy before you touch a single fragile line.

In this post, we’ll walk through how to:

Run a refactor war game as a tabletop exercise
Think like a security engineer: treat refactors as potential attack surface changes
Use TDD for all new code added during the refactor
Protect legacy behavior with characterization tests, golden masters, and seams
Map your system like a complex circuit before doing invasive surgery

Why Run a Refactor as a War Game?

A war game is a structured, low-risk way to simulate future events and explore strategies. Militaries and security teams use them to:

Identify blind spots
Expose hidden assumptions
Practice decision-making under constraints

Refactoring a legacy codebase has similar characteristics:

You have incomplete knowledge
The system is already in production
Small changes can have outsized effects
You’re under time and reliability pressure

So instead of “just start refactoring,” treat your effort as a campaign and your meeting room as a war room.

Step 1: Set Up the Tabletop Refactor War Game

You don’t need fancy tools. You need people, paper, and structure.

Who Should Be in the Room

Senior devs / tech lead – understand architecture and tradeoffs
Engineers familiar with the legacy area – know the weird edge cases
QA / test engineers – see failure patterns and blind spots
Security engineer (or someone with that mindset) – map vulnerabilities
Ops / SRE (if available) – understand runtime behavior and blast radius

Artifacts to Bring

High-level architecture diagram
Sequence diagrams or call graphs (generate from traces if needed)
Recent bug and incident reports related to the target area
Existing test coverage reports

Define the Objective

Write it down in one sentence:

“Safely refactor the payment processing module to separate business rules from database access without changing observable behavior.”

That sentence becomes your mission statement for the war game.

Step 2: Map the Codebase Like a Complex Circuit

Before an engineer reworks a complex physical circuit, they study the schematic: voltages, critical paths, components that must never be overloaded.

Treat your codebase the same way.

On a whiteboard or virtual board, draw:

Critical flows (e.g., user signup, checkout, data export)
External interfaces (APIs, message queues, file imports/exports)
Datastores (databases, caches, third-party services)
Shared modules (utilities, legacy libraries everyone depends on)

Then annotate with:

Areas marked "don’t touch" unless absolutely necessary
Hidden coupling (shared global state, singletons, static helpers)
Known dangerous zones ("changing this once broke billing for a week")

Your goal is not perfection; your goal is a working circuit diagram that shows where the current flows and where a careless cut could black out the system.

Step 3: Treat the Refactor Like a Security Wargame

Refactors often change how data flows, where validations happen, and how components interact. That’s exactly where security vulnerabilities creep in.

In your war game, explicitly map the attack surface you might accidentally modify:

Code injection – Are you introducing new dynamic evals, template engines, or plugin loading points?
SQL injection – Are any query builders or ORM layers being touched or bypassed?
XSS (Cross-Site Scripting) – Are you changing how user input is rendered in templates or APIs consumed by UIs?
Authentication/authorization paths – Are you moving or duplicating access checks?

Walk through scenarios:

“We extract this data access layer into a new module. Could any unsanitized input reach SQL now?”
“We introduce a new DTO layer. Are we accidentally passing HTML straight through to the UI without escaping?”
“We move this validation. Could some code path skip it now?”

Write down risks and proposed mitigations. For each high-risk area, you now know you need:

Tests that lock down the behavior
Possibly extra security reviews or static analysis checks

Step 4: Commit to TDD for All New Code

Trying to retroactively TDD an entire legacy system is unrealistic. But you can draw a line in the sand:

Every new class, function, or module created during the refactor must be covered by tests written first.

That means:

Write a test describing the behavior of the new code you’re adding.
Implement just enough to make it pass.
Refactor the new code confidently, with the test as a safety net.

Benefits:

New code is safe to change and easier to understand.
You gradually carve out testable, reliable islands inside the legacy swamp.
You avoid growing the untested legacy surface even further.

TDD doesn’t fix past sins, but it keeps you from committing new ones.

Step 5: Protect Legacy Behavior with Characterization Tests

Legacy code rarely has reliable tests. Before you refactor a fragile area, you need a way to say:

“If I change this, I’ll know whether I broke current behavior.”

Enter characterization tests.

A characterization test doesn’t assert what the behavior should be; it asserts what the behavior is today.

The workflow:

Identify a legacy function or class you need to modify.
Call it with real-world inputs (from logs, production-like data, or fixtures).
Capture its current outputs/side effects.
Write tests that assert: “Given X input, I currently get Y output.”

Even if the behavior is weird or imperfect, you’re documenting reality. Now, when you refactor, tests will tell you if you’ve changed that reality—intentionally or not.

Over time, you can:

Gradually improve behavior
Tighten assertions
Replace legacy functions with cleaner equivalents

But you do it safely, with a net.

Step 6: Use Golden Master Testing for Complex Behaviour

Sometimes behavior is too complex to characterize with a handful of tests:

Output depends on a large number of inputs
There are many edge cases
The code path is long and convoluted

This is where golden master testing shines.

How Golden Masters Work

Record: Capture a large set of realistic inputs and the corresponding outputs from the existing system.
Freeze: Store these outputs as your “golden master.”
Refactor: Change internals, structure, algorithms.
Compare: After each change, run all inputs through the new version and compare the outputs to the golden master.

If anything changes unexpectedly, you know where to look.

Golden masters are especially useful for:

Complex report generation
Data transformations and ETL processes
Legacy business rule engines

In your war game, identify modules that are good golden master candidates and plan how you’ll capture representative data.

Step 7: Create Seams to Control Change

Legacy code often resists testing because everything is tangled:

Global state
Hard-coded dependencies
Static singletons

A seam is a place in the code where you can change behavior without editing existing code everywhere.

Examples:

Introducing an interface and an adapter around an external API
Wrapping static calls in a thin delegating class you can mock
Inserting a configuration object instead of direct environment reads

In your war game, look at your circuit diagram and ask:

Where can we insert seams that make this area testable?
Where can we add indirection without changing behavior?

Often the first step of a refactor campaign is not “better design” but “add a seam so we can test safely.”

Running the War Game: A Sample Agenda

A 90–120 minute war game session might look like this:

10 min – Mission & constraints
Define the refactor goal, deadlines, and non-negotiables (uptime, security).
20 min – System mapping
Draw the circuit diagram, critical flows, and dependencies.
20 min – Attack surface review
Identify potential security and reliability risks from the planned changes.
20 min – Testing strategy
Decide:
- Where to apply TDD for new code
- Which areas need characterization tests
- Which components need golden masters
- Where you must add seams first
20–30 min – Scenario walk-throughs
Play out concrete “what if…” scenarios:
- What if this call starts returning null?
- What if this DB query slows down by 2x?
- What if this template escapes differently?
10 min – Action plan
Capture concrete tasks and owners for:
- Creating test harnesses
- Building golden master datasets
- Introducing seams
- Scheduling high-risk code reviews

You leave not with vague confidence, but with a map and a playbook.

Conclusion: Win the Refactor Before You Write Code

The most successful refactors are often won before the first pull request:

You’ve mapped the system like a complex circuit.
You’ve treated architectural change as a security and reliability risk, not just a design exercise.
You’ve committed to TDD for all new code so you don’t grow the legacy mess.
You’ve wrapped fragile areas in characterization tests and golden masters.
You’ve introduced seams that let you change behavior safely.

Tabletop-style war games turn refactoring from a leap of faith into a disciplined campaign. Instead of “we’ll see what breaks in staging,” you simulate, plan, and then execute.

Before you open your IDE on that fragile codebase, grab a whiteboard and some colleagues. Run the war game. Win the refactor on paper first—then make it real in code.