Rain Lag

The One-Page Risk Radar: Sketching Failure Hotspots Before You Write a Single Line of Code

How a simple one-page risk radar can reveal your software’s most dangerous failure hotspots—before you write a single line of code—and help you avoid the next high‑profile architectural disaster.

The One-Page Risk Radar: Sketching Failure Hotspots Before You Write a Single Line of Code

Most teams start with a vision, a backlog, and maybe a high-level architecture diagram. Very few start with a map of how their system could fail—before a single line of code is written.

That’s a problem.

Modern systems don’t usually fail because someone forgot a semicolon. They fail because architectural, security, or process risks were invisible until it was too late. Think of the strain on collaboration tools when COVID-19 hit, or the fallout from security missteps like Okta’s 2023 session hijacking issues. Those were not “oops, we mis-typed a variable” failures; they were architectural and systemic risk failures.

A simple antidote is the one-page risk radar: a visual, brutally honest picture of where your future system is most likely to break.


What Is a One-Page Risk Radar?

A risk radar is a single-page heat map of potential failure hotspots. You capture all kinds of risks—technical, architectural, security, process, operational—and plot them on a grid with two axes:

  • Likelihood (Probability): How likely is this risk to materialize?
  • Impact (Consequence): How bad will it be if it does?

Each risk becomes a point on this 2D map. The further it is toward high likelihood and high impact, the more dangerous it is.

The magic is in its simplicity:

  • One page forces prioritization.
  • Visual layout exposes clusters of trouble.
  • You can create it before you build anything.

Instead of guessing which tasks are important, you have a visual failure forecast that tells you where early effort buys you the most stability.


Why Do This Before Writing Code?

Running a risk radar exercise before coding is like giving your system a physical examination before it enters the wild.

  • You catch structural weaknesses before they become expensive rewrites.
  • You expose security gaps before attackers do.
  • You align the team on what can go wrong, not just what you hope will go right.

Good architecture is often invisible when it works. For example, Zoom’s ability to scale during COVID-19 looked almost effortless from the outside. But the reality was years of architectural decisions aimed at scalability, fault tolerance, and performance. Those choices gave their system a kind of quiet resilience when demand spiked.

By contrast, when risks are ignored early, they tend to resurface as public failures. In 2023, Okta’s session hijacking issues highlighted how architectural and security risks around session management, token handling, and integration boundaries can turn into high-profile incidents when not surfaced and mitigated early.

A risk radar helps you front-load that thinking. It doesn’t prevent all issues, but it dramatically increases the chance that your worst problems are addressed while they’re still cheap to fix.


Step 1: Collect Risks from Every Angle

Start by brainstorming anything that could materially hurt the success of the system. Don’t limit yourself to bugs or features; you’re mapping failure modes, not tasks.

Include risks from at least these buckets:

  • Technical
    • Performance bottlenecks
    • Complex concurrency or state management
    • Third-party dependencies and integration fragility
  • Architectural
    • Single points of failure
    • Scaling limits (data, users, locations)
    • Weak separation of concerns and unclear boundaries
  • Security & Privacy
    • Authentication and authorization design
    • Session and token handling (e.g., the kind of issues Okta faced)
    • Data protection, encryption, and compliance obligations
  • Operational
    • Observability gaps (logs, metrics, traces)
    • Disaster recovery and backup plans
    • Deployment, rollback, and configuration complexity
  • Process & People
    • Lack of clear ownership
    • Unproven team skills or new technologies
    • Weak review or testing practices

Write every risk as a short, concrete statement:

“Session tokens might be reusable across devices without proper revocation, enabling session hijacking.”
“Database writes may not scale beyond 10x current projected load.”
“Only one engineer understands the deployment pipeline; bus factor = 1.”

Don’t filter yet. Volume first, judgment later.


Step 2: Score Each Risk by Likelihood and Impact

Now, give each risk two scores:

  • Likelihood: How probable is this risk if we do nothing? (1 = very unlikely, 5 = very likely)
  • Impact: How bad is it if it occurs? (1 = minor inconvenience, 5 = existential or severe)

You can keep the scale simple:

  • 1–2: Low
  • 3: Medium
  • 4–5: High

What matters is consistency and comparative ranking, not absolute precision. Invite representatives from architecture, security, product, and operations if possible—different perspectives catch different realities.

Example:

  • "Session tokens can be reused across devices" → Likelihood: 4, Impact: 5
  • "Reports might render slightly slower than expected" → Likelihood: 4, Impact: 2
  • "Our cache cluster might not scale beyond 3 nodes" → Likelihood: 3, Impact: 4

You’ll quickly see which ones feel dangerous.


Step 3: Plot the Risk Radar Heat Map

Take a blank grid with Likelihood on one axis and Impact on the other (both from 1 to 5). Then:

  1. Plot each risk as a point at (Likelihood, Impact).
  2. Use color or size to encode additional meaning if you like:
    • Color by category (tech, security, process, etc.).
    • Larger dots for higher combined scores (Likelihood × Impact).

Even a rough sketch on a whiteboard or paper works. This does not need to be a fancy tool.

You’ll typically see three zones:

  • Top-right (High likelihood, High impact): The danger zone. These are early, non-negotiable focus areas.
  • Top-left (Low impact, High likelihood): Nuisances. Worth addressing with automation or guardrails.
  • Bottom-right (High impact, Low likelihood): Black swans. Think disaster recovery and contingency planning.

What matters is relative position: where are your true hotspots?


Step 4: Turn the Radar into Early Action

A one-page radar is useless if it doesn’t change decisions. Use it to drive what you do next.

For risks in the danger zone (high likelihood, high impact), consider:

  • Architecture spikes: Small, time-boxed experiments to validate assumptions.
  • Design changes: Introducing boundaries, queues, caching, stronger isolation, or redundancy.
  • Security hardening: Stronger session controls, token lifetimes, revocation mechanisms, and audit trails.
  • Explicit non-goals: Deciding not to support certain scenarios initially to reduce exposure.

For high-impact but lower-likelihood risks:

  • Add disaster recovery and failover plans.
  • Strengthen observability to get early warning.
  • Document runbooks: what to do if this actually happens.

For high-likelihood but lower-impact issues:

  • Automate workarounds.
  • Improve tooling, tests, and processes.

Crucially, revisit the radar periodically:

  • After major architectural decisions
  • Before big releases
  • When incidents teach you something new

Your risk radar is a living artifact, not a one-time workshop output.


Learning from Quiet Wins and Loud Failures

The contrast between systems like Zoom and incidents like Okta’s session hijacking problems reveals the value of early risk thinking:

  • Zoom benefited from strong, deliberate architectural choices around scalability and reliability long before the pandemic. When user counts exploded, those previously invisible decisions produced quiet resilience.
  • Okta’s 2023 issues around session hijacking highlighted what can happen when session and token risks aren’t fully surfaced or mitigated at design time. Once such an issue is in production, fixes must be layered on top of a running system—under scrutiny, with customers already affected.

A risk radar doesn’t guarantee you’ll become the next Zoom or avoid all Okta-style incidents. But it significantly increases the odds that:

  • You recognize your equivalent of session management or scaling as a critical hotspot early.
  • You invest in the right places before stressors (traffic spikes, attackers, compliance changes) arrive.

It replaces vague optimism with a visible, shared understanding of where you’re vulnerable.


How to Start Tomorrow

You don’t need a new tool or a big ceremony to adopt a risk radar. You can start with:

  1. A whiteboard or shared doc titled: “Risk Radar – v0.1”.
  2. One hour with key people: architect, tech lead, security, product, ops.
  3. A simple grid: Likelihood (1–5) vs Impact (1–5).
  4. 20–30 risks across technical, architectural, security, process, and operational dimensions.
  5. Plot, then pick the top 5 risks in the top-right of the heat map.
  6. Turn those 5 into concrete actions: spikes, design changes, tests, or security reviews.

Repeat as your understanding evolves.


Conclusion: Draw the Map Before You Build the City

Software systems rarely collapse for a single, simple reason. They fail where unseen risks intersect with unexpected stress.

A one-page risk radar is your chance to draw the failure map before you build the city. By visualizing risks on a heat map of likelihood and impact, you:

  • Expose hidden architectural, security, and process weaknesses early.
  • Prioritize effort on the most dangerous failure modes.
  • Build the kind of quiet resilience that only becomes visible when the world suddenly leans on your system.

Before your next big project kicks off, resist the urge to jump straight into user stories and implementation. Take an hour to sketch your risk radar.

You may discover that the most valuable thing you build this week isn’t code at all—it’s a clearer view of where everything might break, and a plan to make sure it doesn’t.

The One-Page Risk Radar: Sketching Failure Hotspots Before You Write a Single Line of Code | Rain Lag