How AI Agents Made Me CLI-First

AI27 May 2026· 9 min read

How AI agents pulled me into the terminal and changed how I build software, from nested loops and AFK workflows to risk-gated PRs and scheduled orchestration.

On this page

From Pipeline to Loops
The Outer Loop: Ideas to Sliced Issues
The Inner Loops: AFK Implementation
Agent Cross-Checking, Not CI
Risk Gating: What Needs a Human?
Continuous Learning
Failure Modes
What Runs When
CLI-First is the Side Effect
Your New Job

I used to sit on Safari checking gold prices. Bookmarks open, tabs multiplying, clicking around the page for the number I needed. Now I type goldcli price --today and the answer is there in a second. That one thing started a cascade. I started asking: what else am I doing in a browser that I could do from here?

The answer turned out to be most of my job.

AI agents are the reason. They live in the terminal. They read and write files, run commands, open PRs. To work with them effectively, I needed to meet them there. And the more I did, the more I realised the terminal is not just where the agents live. It is the better place to work full stop. No clicking, no context switching, no "can I automate this?" The answer is always yes, because everything is a command.

This article is about the workflow that came out of that shift. How I build software now. How agents and humans share the work. And how the terminal became the centre of everything.

From Pipeline to Loops

flowchart TB subgraph Human["YOUR JOB — Human in the Loop"] direction LR Idea[Idea] --> Research[Research & Plan grill-me skill] --> PRD[PRD] --> Slice[Slice into AFK Issues] end subgraph Auto["AUTOMATED — Agents + Gates (no human)"] direction TB subgraph Agent["Agent Implementation Loop"] direction LR Pick[Pick AFK Issue] --> Implement[Implement TDD] --> Simplify[Simplify] --> Hooks[Pre-commit Hooks] --> OpenPR[Open PR] end subgraph Gate["Risk & Merge Gate"] direction TB Classify[Classify Risk] --> Low[Low Risk] Classify --> High[High Risk] Low --> AutoM[Squash + Auto-Merge] High --> Review[Human Review] end OpenPR --> Classify end subgraph Learn["SYSTEM LEARNS — Feedback Loop"] direction LR Capture[Capture Feedback] --> Update[Update Rules AGENTS.md / patterns] --> Better[Next Cycle Improves] end Slice --> Pick AutoM --> Capture Review --> Capture Better -.-> Pick style Human fill:#e3f2fd,stroke:#1565c0,color:#0d47a1 style Auto fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20 style Agent fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20 style Gate fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20 style Learn fill:#fff3e0,stroke:#e65100,color:#bf360c

The pipeline still exists. But the shape has changed. It is now a system of stages, each with a different balance of human and agent involvement. The outer loop is HITL — you set direction, research, and slice the work. The inner loops are AFK — agents pick up issues and run with them. The risk gate decides whether a human needs to look at the output. And the learning loop feeds every result back into the system so the next cycle is better than the last.

This distinction between HITL (human in the loop) and AFK (away from keyboard) is the key design decision in every project now. Not what framework to use. Not what database. What should run with a human watching, and what should run on its own.

The Outer Loop: Ideas to Sliced Issues

Every feature starts the same way. An idea pops into my head. Could be from a bug I noticed. Could be from reading someone else's code. Could be from a conversation.

I open a terminal and start an opencode session. I use a skill called /plan-grill-me that interviews me about the idea. It asks the questions I would not think to ask myself. Edge cases. Dependencies. What success looks like. The session is conversational. We figure out the shape of the thing together.

When we have a solid plan, I write a PRD. A short document that states the requirements, the constraints, and what good looks like. Nothing long. Just enough to set direction.

Then I run /plan-to-prd. This skill takes the PRD and converts it into a parent GitHub issue. Then it slices that issue into smaller, individual implementation issues. Each one gets the afk label. Each one is small enough for an agent to own start to finish.

This outer loop is HITL by design. Direction setting is a human job. The agent helps me think, but I make the calls.

The Inner Loops: AFK Implementation

The sliced afk issues sit in the backlog. They get picked up on a schedule. Daily, weekly, monthly depending on the category.

flowchart TD Trigger[Scheduler Trigger] --> Pick[Pick Next AFK Issue] Pick --> Implement[Implement with TDD] Implement --> Simplify[Simplify Recently Changed Code] Simplify --> Hooks[Pre-commit Hooks Run] Hooks --> PR[Open Pull Request] PR --> Risk[Classify Risk] Risk --> High{High Risk?} High -->|Yes| Human[Flagged for Human Review] High -->|No| CI{CI Passes?} CI -->|Yes| Merge[Auto-Merge] CI -->|No| Wait[Wait for CI / Fix] Human --> Learn[Capture Feedback Update Rules] Merge --> Learn Learn -.-> Pick

Each issue gets its own fresh opencode session. The agent reads the issue, reads the context, implements the feature with TDD, simplifies the code, runs pre-commit hooks, and opens a pull request. No human touches it until the PR lands.

This is the part people find surprising. A full cycle from issue to PR with zero human intervention. It works because each issue is small, well-defined, and labelled AFK. The agent does not need to make judgement calls. It just executes.

There are also scheduled loops that create new AFK issues. An architecture review runs daily on some projects. It scans the codebase, finds areas that could be simplified or consolidated, and opens issues. A security review does the same for vulnerabilities. A code simplification pass runs and creates issues for functions that are too long or too nested. These loops generate their own work.

Agent Cross-Checking, Not CI

I used to rely heavily on CI. Write code, push, wait for the pipeline to tell me I missed a lint rule or a type error. Fix, push again, wait again. The feedback loop was minutes long.

That has changed. Within each AFK loop, agents check each others work before anything reaches CI. The implement agent finishes. A simplify agent reviews the output. Pre-commit hooks catch formatting, types, and lint issues locally. By the time the PR opens, most failures are already caught.

CI is still there. But it is a final signature, not the primary safety net. The real quality check happens in the seconds after the code is written, inside the agent loop. This is faster and it catches more.

Risk Gating: What Needs a Human?

Not all PRs are equal. A documentation fix and a payment integration change should not be treated the same way. So every PR gets classified by risk.

flowchart TD Start[PR Opened] --> Files[Examine Touched Files] Files --> Labels[Check Issue Labels] Labels --> Decision{High Risk?} Decision -->|Files match: auth, payments, secrets, migrations, config| High[Label: risk-high] Decision -->|Labels match: feature, enhancement| High Decision -->|Neither| Low[Label: risk-low] High --> Block[Flag for Human Review] Low --> CIGreen{CI Green + No Deferred?} CIGreen -->|Yes| Auto[Squash Merge + Delete Branch] CIGreen -->|No| Wait[Wait or Fix]

Low risk: documentation, spelling fixes, simple refactors, dependency bumps. These auto-merge when CI passes.
High risk: touches auth, payments, secrets, database migrations, configuration, or the issue is labelled as a feature or enhancement. These are flagged for human review. No auto-merge.

The risk classification is automatic. File paths and issue labels determine the level. Some projects disable auto-merge entirely as a policy choice. The important thing is that the decision is explicit, not accidental. You decide what needs eyes and what does not.

Continuous Learning

Every merge or review is a learning opportunity. When a human flags something in a high-risk PR, that feedback gets captured. A rule gets written. A pattern gets documented. When a low-risk PR auto-merges cleanly, that validates the current rules are working.

The capture happens in the same session that handles the review. The agent updates AGENTS.md, or adds a new pattern to the project instructions, or records an edge case that was missed. The next time an agent encounters the same situation, it handles it correctly without a human needing to catch it again.

This is what makes the system improve over time. Without the learning loop, agents plateau. They make the same mistakes repeatedly because nothing feeds back into their instructions. With it, every cycle makes the next one better.

The learning loop is also why the system works at scale. When you have multiple workers running in parallel, each one benefits from what the others learned. A pattern caught in one PR review becomes a rule that protects all future PRs.

Failure Modes

When an agent loop fails, what happens next depends on the kind of failure.

Stop and leave the branch: the default. Partial work has value. A human can pick it up and finish it.
Stop and remove the branch: for cleanup tasks or experiments. If the output has no value, remove it.
Retry once: transient failures. Network issues, rate limits. Try again.
Skip and continue: non-critical steps. If the SEO review fails, the architecture review still runs.

There is also a pattern called exit 75. It means "nothing to do, this is not an error." If no AFK issues exist in the backlog, the implementation loop exits cleanly. No noise, no alerts. It just tries again next time.

What Runs When

Different loops run on different cadences. The scheduler is the heartbeat of the whole system.

gantt title Four Issues in Under an Hour dateFormat HH:mm axisFormat %H:%M tickInterval 10minute section Full Cycle Pick + Implement Issue A :a1, 10:00, 7m Open + Classify PR :a1r, after a1, 3m Auto-Merge :a1m, after a1r, 2m section Full Cycle Pick + Implement Issue B :b1, 10:15, 8m Open + Classify PR :b1r, after b1, 2m Auto-Merge :b1m, after b1r, 2m section Full Cycle Pick + Implement Issue C :c1, 10:30, 6m Open + Classify PR :c1r, after c1, 3m High Risk - Flag for Human :c1m, after c1r, 1m section Full Cycle Pick + Implement Issue D :d1, 10:45, 7m Open + Classify PR :d1r, after d1, 2m Auto-Merge :d1m, after d1r, 2m

Four issues in under an hour. From backlog to merged code. No standup, no sprint planning, no ticket grooming. Each cycle takes ten to fifteen minutes. If the agent has work, it runs. If not, it exits cleanly and checks again later.

Now imagine scaling that. The same loop can run on multiple workers in parallel. Each worker picks up a different issue. They do not wait for each other. They do not block each other.

gantt title Parallel Workers Scale Throughput dateFormat HH:mm axisFormat %H:%M tickInterval 10minute section Worker 1 Issue A Pick + Implement :w1a, 10:00, 7m Issue A Open + Classify + Merge :w1b, after w1a, 5m Issue E Pick + Implement :w1c, 10:20, 8m Issue E Open + Classify + Merge :w1d, after w1c, 4m section Worker 2 Issue B Pick + Implement :w2a, 10:00, 8m Issue B Open + Classify + Merge :w2b, after w2a, 4m Issue F Pick + Implement :w2c, 10:20, 6m Issue F Open + Classify + Merge :w2d, after w2c, 5m section Worker 3 Issue C Pick + Implement :w3a, 10:00, 6m Issue C Flag for Human Review :w3b, after w3a, 3m Issue G Pick + Implement :w3c, 10:20, 7m Issue G Open + Classify + Merge :w3d, after w3c, 4m section Worker 4 Issue D Pick + Implement :w4a, 10:00, 7m Issue D Open + Classify + Merge :w4b, after w4a, 5m Issue H Pick + Implement :w4c, 10:20, 8m Issue H Open + Classify + Merge :w4d, after w4c, 4m

Eight issues in twenty minutes. Four workers running the same loop in parallel. The work is independent. The issues are small and pre-sliced. The bottleneck is not the implementation speed. It is how fast you can write good issues.

The old way meant waiting for the next sprint to start a task. The new way means an agent can pick up, implement, and merge four issues in the time it used to take to find a seat in the standup room.

CLI-First is the Side Effect

The reason all of this works is because every tool involved has a CLI interface. opencode. gh. git. td for Todoist. goldcli. Everything is a command.

The terminal is the universal adapter for agents. An agent cannot open a browser and click around. But it can run a command, parse the output, and act on it. When your workflow lives in the terminal, every part of it is automatable. There is no seam where a human has to step in because the tool requires a GUI.

This is the side effect I did not expect. I started using agents in the terminal because that is where they work. I stayed because the terminal is genuinely better. Faster. More composable. More scriptable. The browser is the default for most people. The terminal is becoming the default for me.

Your New Job

Your job is no longer writing code. It is designing loop boundaries. Knowing which decisions need a human in the room and which ones an agent can own start to finish. Where to place the review gate. What cadence produces the right amount of work.

The old model of Agile assumed one pipeline with uniform tickets. Everyone picks from the same backlog. Everything moves through the same process. That model does not fit the new world. Some tickets are agent work. Some are human work. The scope changes from "all tickets go through sprint planning" to "some tickets never see a human until the PR lands."

Smaller teams win here. They communicate faster. They have fewer handoffs. They can design their loop boundaries without layers of process. AI amplifies that advantage.

The developers who thrive in this world are not the ones who write the most code. They are the ones who design the best loops.

← Older

How I Build Software With Agent Loops

Newer →

Harness Engineering

A weekly newsletter on React, Next.js, AI-assisted development, and engineering. No spam, unsubscribe any time.