AI

AI Code Review: Pre-Existing Bugs as a Side Quest

Running multiple AI models in parallel for PR review finds bugs you never knew you had. The catch? It makes you slower, not faster. And that is the point.

The prevailing AI coding narrative is simple: spew out code as fast as possible. Open massive PRs, merge them unvetted, ship it. Speed is the only metric that matters.

Nolan Lawson proposes the opposite in Using AI to Write Better Code More Slowly. His thesis: LLMs are flexible enough to use for quality just as effectively as for velocity. You can run them in parallel, cross-reference their findings, and catch bugs that no single reviewer would spot. The cost is time. The payoff is a codebase that actually improves instead of accumulating technical debt.

This is a strong counterpoint to the slop-cannon narrative. It maps directly onto patterns we already use in our own development workflows.

The triple-agent code review pattern

Nolan's approach is built around a Claude skill that runs three independent code reviewers in parallel: a Claude sub-agent, Codex, and Cursor Bugbot. Each reviews the same PR independently, without seeing the others' results. Once all three have reported back, the main agent cross-references their findings, filters out hallucinations, and produces a consolidated report.

The prompt is identical across all three agents. It focuses on six categories in order of importance: functional bugs, KISS violations, DRY violations, missing tests, performance issues, and accessibility. Using the same lens for each agent means their disagreements are meaningful, not just a product of different instructions.

The reason for running multiple models is statistical. A single model hallucinates bugs. Two models that agree on the same bug are more likely to be correct. Three models that independently flag the same race condition are probably onto something real. This is not a new insight. The milvus blog article on AI code review getting better when models debate demonstrated the same principle. Nolan adapted his approach from that work.

Pre-existing bugs as a side quest

Here is where the approach gets counterintuitive. When you run a multi-model review on a PR, the agents do not limit themselves to the diff. They read the surrounding code, trace the logic, and uncover bugs that predate the change entirely.

Nolan calls this "pre-existing bugs as a side quest," and it is one of the more honest descriptions of AI code review I have read. His typical workflow is:

  1. Have an agent fix all critical and high-severity bugs
  2. Skip mediums where the fix would be disproportionate to the risk
  3. Abandon the PR entirely if it has so many criticals that the whole approach is wrongheaded

The result is slower development. More careful and more satisfying for it. Codebases improve over time instead of decaying. Engineers develop a deeper understanding of where the bodies are buried.

One developer in the comments described this perfectly: the most valuable thing an AI agent did for him was flag a vulnerability in his Row-Level Security policies that he would have shipped to production. Fixing it took hours and led to a deep dive on Postgres RLS he had not planned that week. Not a productivity win by any typical metric. But the agent review taught him more about that part of the stack than the documentation ever did.

The same pattern shows up repeatedly in the responses. The agents find the bugs; you still need to understand them to fix them. This makes the process inherently educational. The side quest is where the real learning happens.

Why this is a senior engineer move

The caveat that matters: this approach requires judgement. A junior developer who cannot distinguish a real race condition from a theoretical one will drown in the output. A senior engineer can triage instantly: fix the criticals, skip the noise, abandon a fundamentally wrong approach before investing more time in it.

Nolan touches on something important in his comments. Sometimes the bugs his agents find are along the lines of "if a future author adds a new enum here" or "if this job happens to run before this other job." These are often unlikely scenarios. But even in those cases, it is a code smell worth documenting. The value is in knowing what is there and making a conscious decision about it, not in fixing everything.

The discipline of running three reviewers in parallel and only then doing your own research is the opposite of the typical AI workflow. The usual pattern is: ask an agent, accept the first result, move on. Nolan's approach treats AI output as a starting point for investigation, not as a deliverable.

What this means for our pipeline

We already use some of these patterns. The agent cross-checking step in our development loops has one agent review another's work before CI runs. The review stage in the software factory pipeline blocks the merge if the reviewer agent does not approve. The sub-agent pattern for PR review shows how to dispatch focused review agents.

What we are missing is the multi-model aspect. Running different models against the same code and cross-referencing their findings is a natural extension. The existing PR review workflow dispatches a single agent for review. Changing it to dispatch three agents in parallel (one using the default model, one using Opus for deep reasoning, one using a different model family entirely) would mirror Nolan's approach while fitting our existing architecture.

The practical implementation is straightforward. The skill is already written and public. Adapting it to our toolchain means replacing Cursor Bugbot and Codex with whatever models our stack supports. The core insight (parallel independent review followed by cross-referencing and human validation) is tool-agnostic.

The antidote to vibe coding

You can use AI to churn out five-hundred-line PRs of barely-vetted code while the codebase rots from the inside. Or you can invest the time to use AI as a thorough, multi-perspective reviewer that teaches you about failure modes you did not know existed.

Nolan's approach is a blueprint for the second path. It is slower and more expensive and requires more skill from the engineer running it. But the compounding effect on codebase quality is real. Every review that catches a pre-existing bug is debt repaid. Every side quest that teaches you something about your system is knowledge earned.

If you want to write better code, try writing it more slowly. Let the agents find the bugs you did not know you had. Then fix them, learn from them, and ship something that is actually better than what you started with.

← Older
AI Job Interviews
Newer →
AI Ate the Translation Layer

Newsletter

A weekly newsletter on React, Next.js, AI-assisted development, and engineering. No spam, unsubscribe any time.