AI

How I Build Software With Agent Loops

The AI-assisted development workflow I use every day. From HITL planning to AFK implementation, risk-gated PRs, and continuous learning loops.

My IDE is a terminal. My pair programmer is an AI agent. My CI pipeline has more automation than a factory floor. This is how I build software now.

The workflow is not a straight line anymore. It is a system of nested loops. Some loops have a human in them. Some run fully autonomously. The skill is knowing which is which.

This article assumes you are comfortable with the idea of AI agents writing code. If you want to know why I do all of this from the terminal, read How AI Agents Made Me CLI-First. This one is about the workflow itself.

From Pipeline to Loops

flowchart TB subgraph Human["YOUR JOB — Human in the Loop"] direction LR Idea[Idea] --> Research[Research & Plan<br/>grill-me skill] --> PRD[PRD] --> Slice[Slice into<br/>AFK Issues] end subgraph Auto["AUTOMATED — Agents + Gates (no human)"] direction TB subgraph Agent["Agent Implementation Loop"] direction LR Pick[Pick AFK Issue] --> Implement[Implement TDD] --> Simplify[Simplify] --> Hooks[Pre-commit Hooks] --> OpenPR[Open PR] end subgraph Gate["Risk & Merge Gate"] direction TB Classify[Classify Risk] --> Low[Low Risk] Classify --> High[High Risk] Low --> AutoM[Squash + Auto-Merge] High --> Review[Human Review] end OpenPR --> Classify end subgraph Learn["SYSTEM LEARNS — Feedback Loop"] direction LR Capture[Capture Feedback] --> Update[Update Rules<br/>AGENTS.md / patterns] --> Better[Next Cycle<br/>Improves] end Slice --> Pick AutoM --> Capture Review --> Capture Better -.-> Pick

The pipeline still exists. But the shape has changed. It is now a system of stages, each with a different balance of human and agent involvement. The outer loop is HITL. You set direction, research, and slice the work. The inner loops are AFK. Agents pick up issues and run with them. The risk gate decides whether a human needs to look at the output. And the learning loop feeds every result back into the system so the next cycle is better than the last.

This distinction between HITL (human in the loop) and AFK (away from keyboard) is the key design decision in every project now. Not what framework to use. Not what database. What should run with a human watching, and what should run on its own.

The Outer Loop: Ideas to Sliced Issues

Every feature starts the same way. An idea pops into my head. Could be from a bug I noticed. Could be from reading someone elses code. Could be from a conversation.

I open a terminal and start an opencode session. I use a skill called /plan-grill-me that interviews me about the idea. It asks the questions I would not think to ask myself. Edge cases. Dependencies. What success looks like. The session is conversational. We figure out the shape of the thing together.

When we have a solid plan, I write a PRD. A short document that states the requirements, the constraints, and what good looks like. Nothing long. Just enough to set direction.

Then I run /plan-to-prd. This skill takes the PRD and converts it into a parent GitHub issue. Then it slices that issue into smaller, individual implementation issues. Each one gets the afk label. Each one is small enough for an agent to own start to finish.

This outer loop is HITL by design. Direction setting is a human job. The agent helps me think, but I make the calls.

The Inner Loops: AFK Implementation

The sliced afk issues sit in the backlog. They get picked up on a schedule. Daily, weekly, monthly depending on the category.

flowchart TD Trigger[Scheduler Trigger] --> Pick[Pick Next AFK Issue] Pick --> Implement[Implement with TDD] Implement --> Simplify[Simplify Recently Changed Code] Simplify --> Hooks[Pre-commit Hooks Run] Hooks --> PR[Open Pull Request] PR --> Risk[Classify Risk] Risk --> High{High Risk?} High -->|Yes| Human[Flagged for Human Review] High -->|No| CI{CI Passes?} CI -->|Yes| Merge[Auto-Merge] CI -->|No| Wait[Wait for CI / Fix] Human --> Learn[Capture Feedback<br/>Update Rules] Merge --> Learn Learn -.-> Pick

Each issue gets its own fresh opencode session. The agent reads the issue, reads the context, implements the feature with TDD, simplifies the code, runs pre-commit hooks, and opens a pull request. No human touches it until the PR lands.

This is the part people find surprising. A full cycle from issue to PR with zero human intervention. It works because each issue is small, well-defined, and labelled AFK. The agent does not need to make judgement calls. It just executes.

There are also scheduled loops that create new AFK issues. An architecture review runs daily on some projects. It scans the codebase, finds areas that could be simplified or consolidated, and opens issues. A security review does the same for vulnerabilities. A code simplification pass runs and creates issues for functions that are too long or too nested. These loops generate their own work.

Agent Cross-Checking, Not CI

I used to rely heavily on CI. Write code, push, wait for the pipeline to tell me I missed a lint rule or a type error. Fix, push again, wait again. The feedback loop was minutes long.

That has changed. Within each AFK loop, agents check each others work before anything reaches CI. The implement agent finishes. A simplify agent reviews the output. Pre-commit hooks catch formatting, types, and lint issues locally. By the time the PR opens, most failures are already caught.

CI is still there. But it is a final signature, not the primary safety net. The real quality check happens in the seconds after the code is written, inside the agent loop. This is faster and it catches more.

Risk Gating: What Needs a Human?

Not all PRs are equal. A documentation fix and a payment integration change should not be treated the same way. So every PR gets classified by risk.

flowchart TD Start[PR Opened] --> Files[Examine Touched Files] Files --> Labels[Check Issue Labels] Labels --> Decision{High Risk?} Decision -->|Files match: auth, payments, secrets, migrations, config| High[Label: risk-high] Decision -->|Labels match: feature, enhancement| High Decision -->|Neither| Low[Label: risk-low] High --> Block[Flag for Human Review] Low --> CIGreen{CI Green + No Deferred?} CIGreen -->|Yes| Auto[Squash Merge + Delete Branch] CIGreen -->|No| Wait[Wait or Fix]
  • Low risk: documentation, spelling fixes, simple refactors, dependency bumps. These auto-merge when CI passes.
  • High risk: touches auth, payments, secrets, database migrations, configuration, or the issue is labelled as a feature or enhancement. These are flagged for human review. No auto-merge.

The risk classification is automatic. File paths and issue labels determine the level. Some projects disable auto-merge entirely as a policy choice. The important thing is that the decision is explicit, not accidental. You decide what needs eyes and what does not.

Continuous Learning

Every merge or review is a learning opportunity. When a human flags something in a high-risk PR, that feedback gets captured. A rule gets written. A pattern gets documented. When a low-risk PR auto-merges cleanly, that validates the current rules are working.

The capture happens in the same session that handles the review. The agent updates AGENTS.md, or adds a new pattern to the project instructions, or records an edge case that was missed. The next time an agent encounters the same situation, it handles it correctly without a human needing to catch it again.

This is what makes the system improve over time. Without the learning loop, agents plateau. They make the same mistakes repeatedly because nothing feeds back into their instructions. With it, every cycle makes the next one better.

The learning loop is also why the system works at scale. When you have multiple workers running in parallel, each one benefits from what the others learned. A pattern caught in one PR review becomes a rule that protects all future PRs.

Failure Modes

When an agent loop fails, what happens next depends on the kind of failure.

  • Stop and leave the branch: the default. Partial work has value. A human can pick it up and finish it.
  • Stop and remove the branch: for cleanup tasks or experiments. If the output has no value, remove it.
  • Retry once: transient failures. Network issues, rate limits. Try again.
  • Skip and continue: non-critical steps. If the SEO review fails, the architecture review still runs.

There is also a pattern called exit 75. It means "nothing to do, this is not an error." If no AFK issues exist in the backlog, the implementation loop exits cleanly. No noise, no alerts. It just tries again next time.

Enforce, Don't Instruct

The most important lesson I have learned from building these agent loops is something Nick Nisi articulated well: enforce things, do not instruct them.

Prompts are soft. A prompt says "run the tests and verify they pass." A good agent follows that instruction. A lazy agent touches a marker file and says it ran the tests. An overwhelmed agent skips the step because it ran out of context.

The fix is not a better prompt. The fix is enforcement in code. Nick built a harness called Case that uses a TypeScript state machine to gate each phase. The implementer cannot move to the reviewer until the verifier has cryptographically proven the tests passed. The closer cannot merge until the reviewer has signed off. The agent cannot shortcut the process because the process is in code, not in instructions.

I use this same idea in my CI gates. The risk classification is code. The rule about "low risk PRs merge when CI passes" is code. A human can review a high-risk PR, but the merge itself still requires CI to be green. The enforcement is not in the agent prompt. It is in the automation.

The takeaway is simple. If a step matters, enforce it with code. A CI gate, a state machine, a validation script. Anything deterministic. The agent can still use judgement to decide what to build. But the gates that verify the work should be outside the agents control.

Replace Trust With Evidence

Every agent in my system has to prove it did the work. I do not trust the output because the agent said "I checked it." I trust it because the system can verify the evidence.

For Nick, this meant SHA-256 hashing test output to prove tests ran. For my system, it means the agent opens a PR and CI runs before merge. The agent describes what it did in the PR body. But the merge gate does not trust the description. It trusts CI passing.

This changes how you build agent workflows. You stop asking "can the agent do this correctly?" and start asking "how can I make it prove it did this correctly?" The answer is usually a gated pipeline where each phase produces verifiable output before the next can start.

What Runs When

Different loops run on different cadences. The scheduler is the heartbeat of the whole system.

gantt title Four Issues in Under an Hour dateFormat HH:mm axisFormat %H:%M tickInterval 10minute section Full Cycle Pick + Implement Issue A :a1, 10:00, 7m Open + Classify PR :a1r, after a1, 3m Auto-Merge :a1m, after a1r, 2m section Full Cycle Pick + Implement Issue B :b1, 10:15, 8m Open + Classify PR :b1r, after b1, 2m Auto-Merge :b1m, after b1r, 2m section Full Cycle Pick + Implement Issue C :c1, 10:30, 6m Open + Classify PR :c1r, after c1, 3m High Risk - Flag for Human :c1m, after c1r, 1m section Full Cycle Pick + Implement Issue D :d1, 10:45, 7m Open + Classify PR :d1r, after d1, 2m Auto-Merge :d1m, after d1r, 2m

Four issues in under an hour. From backlog to merged code. No standup, no sprint planning, no ticket grooming. Each cycle takes ten to fifteen minutes. If the agent has work, it runs. If not, it exits cleanly and checks again later.

Now imagine scaling that. The same loop can run on multiple workers in parallel. Each worker picks up a different issue. They do not wait for each other. They do not block each other.

gantt title Parallel Workers Scale Throughput dateFormat HH:mm axisFormat %H:%M tickInterval 10minute section Worker 1 Issue A Pick + Implement :w1a, 10:00, 7m Issue A Open + Classify + Merge :w1b, after w1a, 5m Issue E Pick + Implement :w1c, 10:20, 8m Issue E Open + Classify + Merge :w1d, after w1c, 4m section Worker 2 Issue B Pick + Implement :w2a, 10:00, 8m Issue B Open + Classify + Merge :w2b, after w2a, 4m Issue F Pick + Implement :w2c, 10:20, 6m Issue F Open + Classify + Merge :w2d, after w2c, 5m section Worker 3 Issue C Pick + Implement :w3a, 10:00, 6m Issue C Flag for Human Review :w3b, after w3a, 3m Issue G Pick + Implement :w3c, 10:20, 7m Issue G Open + Classify + Merge :w3d, after w3c, 4m section Worker 4 Issue D Pick + Implement :w4a, 10:00, 7m Issue D Open + Classify + Merge :w4b, after w4a, 5m Issue H Pick + Implement :w4c, 10:20, 8m Issue H Open + Classify + Merge :w4d, after w4c, 4m

Eight issues in twenty minutes. Four workers running the same loop in parallel. The work is independent. The issues are small and pre-sliced. The bottleneck is not the implementation speed. It is how fast you can write good issues.

The old way meant waiting for the next sprint to start a task. The new way means an agent can pick up, implement, and merge four issues in the time it used to take to find a seat in the standup room.

Your New Job

Your job is no longer writing code. It is designing loop boundaries. Knowing which decisions need a human in the room and which ones an agent can own start to finish. Where to place the review gate. What cadence produces the right amount of work.

The old model of Agile assumed one pipeline with uniform tickets. Everyone picks from the same backlog. Everything moves through the same process. That model does not fit the new world. Some tickets are agent work. Some are human work. The scope changes from "all tickets go through sprint planning" to "some tickets never see a human until the PR lands."

Smaller teams win here. They communicate faster. They have fewer handoffs. They can design their loop boundaries without layers of process. AI amplifies that advantage.

The developers who thrive in this world are not the ones who write the most code. They are the ones who design the best loops.

← Older
Laravel Development Guidelines
Newer →
How AI Agents Made Me CLI-First

Newsletter

A weekly newsletter on React, Next.js, AI-assisted development, and engineering. No spam, unsubscribe any time.