Hamilton Ridley Consulting
Hamilton Ridley Consulting
Hamilton Ridley Consulting
Field Notes · Vol. 04
Working with AI Agents
For New Builders · 2026
A short guide

AI is a force multiplier.

Sharper builders use AI to ship faster than they ever could. Sloppy builders use AI to ship more bugs. AI doesn't change the ratio between those two — it amplifies whatever discipline you bring. The skill that decides which one you become is the loop: the tools you pick, the context you feed, the verification you run, and the rhythm you keep.

!
The trap most beginners hit

AI agents write plausible code by default. Plausible isn't correct. The agent's confidence is not the agent's accuracy. The biggest productivity unlock in 2026 isn't a smarter model — it's a sharper review pass.

Lane 01
Tool
Match the environment to the job
Lane 02
Context
What the agent knows
Lane 03
Verify
Catch what looked right
Lane 04
Session loop
Plan, execute, review, ship
§ 01

The four lanes

Lane 01

Tool

The environment your AI agent runs in. Browser studio, desktop IDE, or terminal CLI — each has a different friction profile and a different ceiling.

What changes between optionsSpeed-to-ship vs. precision. Browser studios are the fastest start; terminal CLIs give you the most control over a real folder. Most builders end up using more than one — pick the right one for the job in front of you, not the one you used last time.
Replit Agent / Lovable
Browser studio. Idea to live URL in minutes. Best for prototypes.
Cursor / Windsurf
Desktop IDE w/ AI in every panel. The daily-driver middle ground.
Claude Code (CLI)
Terminal agent on your real folder. Runs commands, reads logs, owns Git.
Codex CLI
Same shape, different model. Useful as a second opinion when one's stuck.
Lane 02

Context

What the agent knows during a session. Files, project rules, prior conversation, related docs. The context window has a budget — what you spend it on shapes everything else.

What changes between optionsHow disciplined you are about what enters the conversation. More noise = more drift. Less noise = sharper output, fewer hallucinations.
CLAUDE.md / project rules
Persistent rules across sessions. Conventions, what NOT to touch.
@-reference specific files
Point at what's relevant. Don't dump the whole repo.
/clear between tasks
Fresh context for each unrelated problem. Cheap reset, big wins.
Sub-agents
Delegate research and let them return summaries. Protects main context.
Lane 03

Verify

AI agents write plausible-by-default code. Plausible isn't correct. Catching the difference is your job; the agent's confidence isn't the agent's accuracy.

What changes between optionsHow systematically you check before committing. "It looked right" is the most expensive review pass in software.
Read the diff
Line by line. Every change. No exceptions. Use a real diff viewer.
Run the tests
Let them catch what your eyes miss. Add tests for behavior you care about.
Smoke-test the feature
Actually click through it. Or actually run the script. End-to-end.
Watch for deletions
Agents sometimes remove unrelated code. Diff catches it; trust catches nothing.
Lane 04

Session loop

A productive session has rhythm: plan, execute, review, commit. Skip steps and you ship surprises — to your codebase, your branch, or your client.

What changes between optionsHow tight the loop is. Smaller plans, smaller commits, fewer surprises. Long unsupervised sessions are where the silent breakage happens.
Plan first
Agree on the approach before code is written. "Plan mode" or just a TODO list.
Small commits
Review at milestones. Easy to revert. AI agents commit often if you let them.
/clear between features
Fresh state, no cross-contamination. Old context bleeds into new tasks.
Rubber-duck recap
Ask the agent to summarize what it did. Catches the thing it forgot.
§ 02

The same three projects, working with AI

Project A

Contractor lead form

One page, one form, one owner. You want it live by Friday. You don't need a process; you need ergonomics.

Optimize for speed
01 · Tool
Replit Agent
Browser, no setup, live preview while you iterate. One-shot conversation.
02 · Context
Just the prompt
No CLAUDE.md needed. Single session means context is whatever's in the chat. Don't over-engineer this stage.
03 · Verify
Smoke-test it
Fill the form yourself, confirm the email lands. If it works for you, it works for the contractor.
04 · Loop
One sitting
Plan + execute + ship in an afternoon. Tight loop on a small project doesn't need ceremony.
Project B

Multi-user CRM

Eight people log in daily, real revenue runs through it, you'll be reading this code in two years. The session-to-session discipline is what makes future-you not hate present-you.

Optimize for durability
01 · Tool
Claude Code + Sonnet
CLI on your real folder. Sonnet for daily work; Opus when you're making an architecture call. The agent should own Git.
02 · Context
CLAUDE.md + sub-agents
Project rules persist across sessions: naming, what's production, what's scratch. Sub-agents for "go look at how X works in this codebase" without burning main context. Discipline pays compound.
03 · Verify
Diff + tests + staging
Diff review every commit. Playwright on critical paths. Vercel preview URL before main. Three-layer net for anything customers depend on.
04 · Loop
Plan, then execute
Any non-trivial work starts with a written plan. /clear between unrelated features. Small commits. Big sessions are where silent breakage lives.
Project C

Nightly automation

Pulls four APIs at 2 a.m., transforms data, sends a report. Nobody's watching. Subtle wrong is worse than loud broken — silent bad data corrupts decisions for weeks.

Optimize for correctness
01 · Tool
Claude Code, model-switch
CLI for the real folder. Switch to Opus for the tricky transformation logic where reasoning matters; Sonnet otherwise. Match the model to the cognitive load.
02 · Context
Tight scope only
Only the pipeline files, only the relevant API docs. No CLAUDE.md drift, no "while you're here, also fix Y." Narrow context, narrow blast radius.
03 · Verify
Replay last week
Run last week's real input through the new code. Diff the output against last known good. The closest thing to a unit test for a data pipeline.
04 · Loop
Quarterly drift review
Once a quarter, load 30 days of logs and ask the agent what's changed. Catch drift before it becomes outage.
SharePostLinkedIn
Field Notes Subscribers

Vol. 05 — knowing when to call for help.

Subscribe to the next volume — the signs you've hit a wall, what kind of help to ask for, and the kinds of partnerships that compound. Plus future series. One email per release.

Hamilton Ridley Consulting · 2026
daniel.kemp@hamiltonridley.com