Hamilton Ridley Consulting

Field Notes · Vol. 04

Working with AI Agents
For New Builders · 2026

A short guide

AI is a force multiplier.

Daniel Kemp·Published May 5, 2026·9 min read

Sharper builders use AI to ship faster than they ever could. Sloppy builders use AI to ship more bugs. AI doesn't change the ratio between those two — it amplifies whatever discipline you bring. The skill that decides which one you become is the loop: the tools you pick, the context you feed, the verification you run, and the rhythm you keep.

The trap most beginners hit

AI agents write plausible code by default. Plausible isn't correct. The agent's confidence is not the agent's accuracy. The biggest productivity unlock in 2026 isn't a smarter model — it's a sharper review pass.

Lane 01

Tool

Match the environment to the job

Lane 02

Context

What the agent knows

Lane 03

Verify

Catch what looked right

Lane 04

Session loop

Plan, execute, review, ship

§ 01

The four lanes

Lane 01

Tool

The environment your AI agent runs in. Browser studio, desktop IDE, or terminal CLI — each has a different friction profile and a different ceiling.

What changes between optionsSpeed-to-ship vs. precision. Browser studios are the fastest start; terminal CLIs give you the most control over a real folder. Most builders end up using more than one — pick the right one for the job in front of you, not the one you used last time.

Replit Agent / Lovable

Browser studio. Idea to live URL in minutes. Best for prototypes.

Cursor / Windsurf

Desktop IDE w/ AI in every panel. The daily-driver middle ground.

Claude Code (CLI)

Terminal agent on your real folder. Runs commands, reads logs, owns Git.

Codex CLI

Same shape, different model. Useful as a second opinion when one's stuck.

Lane 02

Context

What the agent knows during a session. Files, project rules, prior conversation, related docs. The context window has a budget — what you spend it on shapes everything else.

What changes between optionsHow disciplined you are about what enters the conversation. More noise = more drift. Less noise = sharper output, fewer hallucinations.

CLAUDE.md / project rules

Persistent rules across sessions. Conventions, what NOT to touch.

@-reference specific files

Point at what's relevant. Don't dump the whole repo.

/clear between tasks

Fresh context for each unrelated problem. Cheap reset, big wins.

Sub-agents

Delegate research and let them return summaries. Protects main context.

Lane 03

Verify

AI agents write plausible-by-default code. Plausible isn't correct. Catching the difference is your job; the agent's confidence isn't the agent's accuracy.

What changes between optionsHow systematically you check before committing. "It looked right" is the most expensive review pass in software.

Read the diff

Line by line. Every change. No exceptions. Use a real diff viewer.

Run the tests

Let them catch what your eyes miss. Add tests for behavior you care about.

Smoke-test the feature

Actually click through it. Or actually run the script. End-to-end.

Watch for deletions

Agents sometimes remove unrelated code. Diff catches it; trust catches nothing.

Lane 04

Session loop

A productive session has rhythm: plan, execute, review, commit. Skip steps and you ship surprises — to your codebase, your branch, or your client.

What changes between optionsHow tight the loop is. Smaller plans, smaller commits, fewer surprises. Long unsupervised sessions are where the silent breakage happens.

Plan first

Agree on the approach before code is written. "Plan mode" or just a TODO list.

Small commits

Review at milestones. Easy to revert. AI agents commit often if you let them.

/clear between features

Fresh state, no cross-contamination. Old context bleeds into new tasks.

Rubber-duck recap

Ask the agent to summarize what it did. Catches the thing it forgot.

§ 02

The same three projects, working with AI

Project A

Contractor lead form

One page, one form, one owner. You want it live by Friday. You don't need a process; you need ergonomics.

Optimize for speed

01 · Tool

Replit Agent

Browser, no setup, live preview while you iterate. One-shot conversation.

02 · Context

Just the prompt

No CLAUDE.md needed. Single session means context is whatever's in the chat. Don't over-engineer this stage.

03 · Verify

Smoke-test it

Fill the form yourself, confirm the email lands. If it works for you, it works for the contractor.

04 · Loop

One sitting

Plan + execute + ship in an afternoon. Tight loop on a small project doesn't need ceremony.

Project B

Multi-user CRM

Eight people log in daily, real revenue runs through it, you'll be reading this code in two years. The session-to-session discipline is what makes future-you not hate present-you.

Optimize for durability

01 · Tool

Claude Code + Sonnet

CLI on your real folder. Sonnet for daily work; Opus when you're making an architecture call. The agent should own Git.

02 · Context

CLAUDE.md + sub-agents

Project rules persist across sessions: naming, what's production, what's scratch. Sub-agents for "go look at how X works in this codebase" without burning main context. Discipline pays compound.

03 · Verify

Diff + tests + staging

Diff review every commit. Playwright on critical paths. Vercel preview URL before main. Three-layer net for anything customers depend on.

04 · Loop

Plan, then execute

Any non-trivial work starts with a written plan. /clear between unrelated features. Small commits. Big sessions are where silent breakage lives.

Project C

Nightly automation

Pulls four APIs at 2 a.m., transforms data, sends a report. Nobody's watching. Subtle wrong is worse than loud broken — silent bad data corrupts decisions for weeks.

Optimize for correctness

01 · Tool

Claude Code, model-switch

CLI for the real folder. Switch to Opus for the tricky transformation logic where reasoning matters; Sonnet otherwise. Match the model to the cognitive load.

02 · Context

Tight scope only

Only the pipeline files, only the relevant API docs. No CLAUDE.md drift, no "while you're here, also fix Y." Narrow context, narrow blast radius.

03 · Verify

Replay last week

Run last week's real input through the new code. Diff the output against last known good. The closest thing to a unit test for a data pipeline.

04 · Loop

Quarterly drift review

Once a quarter, load 30 days of logs and ask the agent what's changed. Catch drift before it becomes outage.

Field Notes Subscribers

Vol. 05 — knowing when to call for help.

Subscribe to the next volume — the signs you've hit a wall, what kind of help to ask for, and the kinds of partnerships that compound. Plus future series. One email per release.