Rain Lag

The Pencil-Drawn Runway: Designing Analog Preflight Checks for High‑Risk Deploys

How paper checklists, pencil-drawn runways, and tactile team rituals can dramatically improve reliability for high‑risk software deployments—before you ever hit ‘deploy’.

The Pencil-Drawn Runway: Designing Analog Preflight Checks for High‑Risk Deploys

When a pilot takes off in bad weather or from a short runway, they don’t “trust their gut” or eyeball a few gauges. They run a preflight checklist—systematic, repeatable, boring by design. That boredom is why planes don’t fall out of the sky very often.

Yet in software, especially for high‑risk deploys, we still regularly rely on “it should be fine” and a loose mental model of what needs to be true before pushing code.

This post explores how to bring aviation-style preflight discipline to high‑risk deployments, using a surprising ally: pencil and paper. We’ll look at analog “runways,” tactile rituals, scripted checks with tools like Ansible, and how to keep your checklists alive and evolving.


Why Preflight Checklists Belong in Software

A pilot’s preflight checklist isn’t a suggestion; it’s a contract with reality:

  • Every step is explicit—no relying on memory.
  • Every step is verified—not assumed.
  • Every step is repeatable—the same before every flight.

For high‑risk deploys—schema migrations, major refactors, infra upgrades, region failovers—we need the same habits:

  • Systematic: clear, ordered steps that must be satisfied.
  • Repeatable: the same (or improved) checks every time.
  • Observable: artifacts that show what was checked and when.

Your “pilot error” equivalent in software is often:

  • Someone forgetting a dependency.
  • A config flag left at a default.
  • A hidden environment discrepancy between staging and prod.

Preflight checklists are your defense against those errors before they become incidents.


The Analog Runway: Why Start With Pencil and Paper

It’s tempting to jump straight to fancy dashboards and scripts. But for high‑risk changes, start analog.

A “pencil-drawn runway” is a physical representation of your deployment path:

  • A whiteboard or large sheet of paper.
  • Columns or lanes representing phases: Preflight → Taxi → Takeoff → Climb → Cruise (or whatever metaphors work for you).
  • Sticky notes / index cards representing checks, dependencies, and decisions.

Why analog first?

  1. Forces clarity: If you can’t sketch the change on paper, you don’t understand it yet.
  2. Slows you down, on purpose: You notice gaps when you have to physically place each step.
  3. Makes complexity visible: You see how many checks, handoffs, and assumptions are involved.
  4. Invites collaboration: Anyone in the room can walk up, ask questions, or move a card.

Think of the paper runway as the design space for your preflight process. You’ll automate later, but you design here.


Designing Preflight Checks: What Must Be True Before Takeoff

The key question for any high‑risk deploy is:

What must be true about our systems and environment before we push this change?

Turn that into a checklist. Typical categories:

1. Configuration Readiness

  • Feature flags defined, documented, and defaulted safely.
  • Critical configs (timeouts, connection limits, thread pools) validated for the new load pattern.
  • Secrets and credentials present, valid, and rotated if needed.
  • Config drift checks between staging and production (e.g., config repos or CMDB validated).

2. Dependency Health

  • All downstream services are reachable and on expected versions.
  • Data stores (DBs, caches, queues) have capacity and correct schema/indexes.
  • Third‑party APIs have validated sandbox tests and quota confirmations.
  • Internal libraries and packages are on supported versions with known behavior.

3. Environment Consistency

  • Staging mirrors production in relevant aspects: resource limits, flags, data shapes.
  • Network paths (firewalls, security groups, service mesh policies) are in place.
  • OS / runtime versions (Java, Python, Node, etc.) match expectations.
  • Observability hooks (metrics, logs, traces) are wired and tested in lower envs.

4. Operational Preparedness

  • Rollback plan documented, tested, and time‑bounded.
  • On-call staffing confirmed; escalation paths clear.
  • Runbooks updated to cover expected failure modes.
  • Communication plan ready: who to notify, when, and in what channels.

Each item should be phrased as a binary, verifiable statement:

  • Bad: “DB looks okay.”
  • Good: “DB orders primary is ≤ 70% disk utilization, replication lag < 1s, and failover tested in staging this week.”

From Paper to Script: Automating Preflight with Ansible

Once your analog runway is stable, you can translate it into scripted preflight checks.

Tools like Ansible are ideal because they’re:

  • Declarative: you specify desired state, not just commands.
  • Idempotent: you can re-run checks safely.
  • Auditable: playbooks become versioned artifacts of how you validate readiness.

Example patterns:

  • Config checks: Ansible tasks that verify config files, environment variables, and template rendering.
  • Dependency ping tests: Modules to check TCP connectivity, HTTP health endpoints, or DB connectivity.
  • Environment validation: Ensuring kernel params, runtime versions, and package versions match a known profile.
  • Observability validation: Confirming that test logs and metrics are visible in your monitoring stack.

You can create a preflight.yml playbook that runs before any high‑risk deploy:

ansible-playbook preflight.yml -e env=production

The result should be a simple outcome:

  • CLEAR TO DEPLOY: All checks passed.
  • ⚠️ HOLD: Some checks failed; deployment blocked until resolved.

You’re basically building a mechanized preflight that enforces the truths you agreed on during the analog design phase.


The Obeya-Style Room: Tactile Rituals for Risky Changes

Borrow a concept from lean manufacturing: Obeya, the “big room” where teams visualize work and coordinate in real time.

For high‑risk deployments, set up an “Obeya‑style” environment:

  • A large wall board or whiteboard as your runway.
  • Printed or handwritten checklists for this specific deploy.
  • Status lanes: Not Started → In Progress → Verified.
  • Owners attached to each critical check.

During the preflight session:

  1. Walk the runway from left to right.
  2. Read each check aloud.
  3. Confirm whether it’s covered by scripted automation, manual verification, or both.
  4. Physically move items to “Verified” only when proof is presented.

This tactile, collaborative ritual:

  • Aligns everyone on the current state of readiness.
  • Surfaces assumptions and gaps early.
  • Builds shared ownership of the risk.

You’re not just pushing code; you’re collectively deciding whether the runway is safe.


Living Documents: Evolving Checklists After Incidents and Near-Misses

Aviation checklists are not sacred; they’re continuously refined after every incident, accident, or near miss.

Treat your deployment preflight checklists the same way:

  • After an incident, ask: What check could have revealed this earlier?
  • After a near miss, ask: What warning sign did we ignore or never look for?
  • After a smooth but complex deploy, ask: Which checks were most valuable? Which were noise?

Concrete practices:

  • Post-incident updates: Every incident review includes a section, “Checklist additions/changes.”
  • Versioning preflight docs: Store them in Git; tag versions with dates and deploy identifiers.
  • Retire stale checks: Regularly prune items that no longer add signal.
  • Promote proven checks to automation: Start as manual; once stable and well understood, script them.

Over time, your checklist evolves from a guess into a hard‑won body of operational knowledge.


Putting It All Together: A Sample Flow for High‑Risk Deploys

Here’s a practical end‑to‑end pattern:

  1. Design the runway on paper

    • Map the deploy phases on a whiteboard.
    • Brainstorm risks and required truths with the team.
    • Turn them into specific, binary checklist items.
  2. Tag checks as manual, scripted, or both

    • Manual: Requires human judgment or multi‑team sign‑off.
    • Scripted: Implementable with tools like Ansible.
    • Both: Automated check plus human confirmation.
  3. Implement scripted preflight checks

    • Build an ansible-playbook preflight.yml.
    • Run in staging and fix false positives/negatives.
    • Add the playbook run to your standard deployment pipeline.
  4. Run an Obeya-style preflight session

    • Gather the relevant engineers, SREs, product owner, and on‑call.
    • Walk the checklist; review automation results.
    • Do not proceed if any critical items aren’t satisfied.
  5. Deploy with a clear “GO/NO‑GO” decision

    • One person is designated as the “pilot in command” with authority to say no.
    • The GO decision is based on checklist completion, not vibes.
  6. Review and refine after

    • Capture surprises and gaps.
    • Update both the checklist and the automation.
    • Share learnings with other teams.

Conclusion: Boring Is a Feature, Not a Bug

High‑risk deploys will never be entirely safe—but they can be systematically safer.

By:

  • Designing analog, pencil-drawn runways to visualize risk.
  • Building comprehensive, scripted preflight checks with tools like Ansible.
  • Using tactile, collaborative rituals to align teams on readiness.
  • Treating checklists as living documents that evolve after every learning moment.

…you move from “hopeful” deployments to disciplined takeoffs.

In aviation, checklists are not about trusting machines or humans; they’re about building a reliable partnership between them. Bring that same mindset to your deployments, and your systems—and your team—will be far more likely to stay aloft when it matters most.

The Pencil-Drawn Runway: Designing Analog Preflight Checks for High‑Risk Deploys | Rain Lag