Hamilton Ridley Consulting

Field Notes · Vol. 06

How HRC Scores Your Code
The published rubric · 2026

The published rubric

Every HRC Code Review uses the same 10-category rubric.

Daniel Kemp·Published May 11, 2026·11 min read

Most code reviews are vibes. We score yours against a fixed rubric — ten categories, stable weights, the same letter-grade ladder for every customer. That's what makes the score comparable across reviews and across re-reviews. A B+ from us in May 2026 means the same thing as a B+ from us in May 2027, and the same thing as a B+ on another partner's repo.

One evaluation lens

Reports score against a single lens: production readiness. The score reflects the gap between your codebase as it stands and a deployable customer-facing service. A working proof-of-concept that validates a product idea may score in the D or F range under this lens, and that's fine — it doesn't invalidate the work. The score is a roadmap toward production, not a verdict on the work to date.

§ 01

The ten categories

Weights are stable across reviews. Security carries the most weight (18%) because security gaps are existential. Reliability and Functional Correctness are next at 15% each — they're the categories where the code lies to you about whether it works. Maintainability is 12% because compounding tech debt is slow death.

#	Category	Weight	What it measures
01	Functional Correctness	15%	Does the application do what it claims? Edge cases? Reproducible defects?
02	Architecture & Structure	10%	Module boundaries, dead-code surface, stack-vs-problem fit.
03	Security	18%	AuthN/Z, input validation, secrets, supply chain, application-layer hardening.
04	Reliability & Error Handling	15%	Tests, failure modes, observability, graceful degradation.
05	Maintainability & Code Quality	12%	Readability, duplication, file size, documentation, convention adherence.
06	Operational Readiness	8%	Deployability, configuration, logging, monitoring, health checks, hand-off readiness.
07	Portability & Vendor Lock-in	7%	Will it run anywhere except where it was built?
08	Dependency Hygiene	5%	Pinning, drift, license compatibility, supply-chain controls.
09	Performance & Cost	5%	Efficiency, AI-cost awareness, scaling.
10	IP & Ownership Clarity	5%	License documentation, copyright attribution, authorship clarity, AI-provenance.

The overall score is the weighted average: overall = Σ (category_score × category_weight). Each category is scored 0–100 against the rubric; the category score is the lowest-passing tier — a single critical gap caps the category regardless of strengths elsewhere.

§ 02

What an A looks like

For each category, the A-tier criteria are what you're aiming at if you want a 90+ score. Most codebases don't need every category at A — but knowing the bar lets you choose where to invest.

Functional Correctness

15%

A: Behavior matches the stated spec across happy path and edge cases. Comprehensive automated test coverage. Schema validation at every input boundary.

Architecture & Structure

10%

A: Clear single-purpose modules. No dead code. Coherent stack choice. Testable boundaries throughout. Frameworks used appropriately, not as glue.

Security

18%

A: Authentication and authorization on every protected route. Input validation at all boundaries. Secrets in a vault, not in code. No known CVE surface. Security headers, rate limiting, tested.

Reliability & Error Handling

15%

A: Comprehensive automated tests (smoke + integration + critical paths). Graceful degradation under partial failure. Structured logging. Error monitoring integrated and actually monitored.

Maintainability & Code Quality

12%

A: Clean code, no notable duplication. Consistent naming. Small focused modules. Documented public APIs. README and ARCHITECTURE.md present. Conventions are load-bearing — CI checks or lint rules enforce them.

Operational Readiness

A: Dockerized. Structured logging, metrics, health + readiness probes. Runbook documented. Configuration via env vars. A new engineer or vendor can productively own this within a week.

Portability & Vendor Lock-in

A: Standard tooling. Deploys to any container platform. Vendor SDKs abstracted behind interfaces so swapping providers is a configuration change, not a rewrite.

Dependency Hygiene

A: Single source of truth per language. Pinned via lockfile. Supply-chain controls (release-age guards, allowlists). License-clean. No known CVEs.

Performance & Cost

A: Profiled at expected scale. Appropriate caching. Cost-aware (batched, cached, capped — especially for AI provider calls). No obvious bottlenecks.

IP & Ownership Clarity

A: LICENSE file present with full text matching all manifest declarations. CONTRIBUTING.md present. Copyright headers in source files. Clear author attribution. If AI tools generated any code, the license posture and attribution are explicit.

§ 03

Letter grades

The overall score maps to a letter grade. The grade is what partners typically share — it's the artifact on the public badge, on the README, in the pitch deck. The underlying number preserves the precision the categories produced.

Range	Grade	Meaning
90–100	A	Production exemplar
85–89	A−	Production-ready with minor cleanup
80–84	B+	Production-ready after focused remediation
75–79	B	Functional; significant focused work before production
70–74	B−	Functional but multiple gaps
65–69	C+	Solid prototype; broad remediation required
60–64	C	Working prototype; major gaps before production
55–59	C−	Proof-of-concept; substantial rework required
50–54	D+	Early prototype
40–49	D	Not production-viable as-is
<40	F	Fundamental rework required

§ 04

Severity levels

Every finding in your report is tagged with one of four severity levels. Every P0 and P1 finding carries both an effort estimate (engineer-hours to close) AND a cost-of-inaction (what gets worse if you defer it). Effort tells you what fixing costs; cost-of-inaction tells you what delaying costs. Both are required for honest prioritization.

Blocking. Must be addressed before any production exposure. Examples: no authentication on a sensitive route; production secret hardcoded in source; SQL injection on a user-input path; default admin credential active; missing CSRF on state-changing endpoints; auth bypass via parameter manipulation.

High. Required before extending the application or onboarding additional users. Examples: rate limiting absent on a costly endpoint; RBAC scope bypass via specific parameter shape; dependency with a known CVE patched in the next major version; missing transaction handling on financial operations; sensitive data logged in plaintext.

Medium. Quality issues to address opportunistically; do not block the next release.

Informational. Nice-to-have refinements.

§ 04.5

Three tiers of HRC engagement

The same v2.0 methodology powers three product tiers, each scaled to a different stage in the partner's decision-making. Lower tiers are glimpses of the full review, not substitutes for it.

Tier 01 · $0

Quick Score

Letter grade, 10-category breakdown, severity counts, and 3 headline finding titles. Automated, free, email-gated. The entry point.

Tier 02 · $300

Findings Reveal

Your top 15 P0/P1 findings with file:line citations and recommended fixes. Branded HRC HTML report. 30-day refund guarantee if any finding is verifiably false.

Tier 03 · $2,000

Standard Code Review

Every finding across all 10 categories. Per-category analysis. Decisions Required, Open Questions, bug catalog. Human-verified. Public score badge. 30-min walkthrough call. The full HRC stamp on your codebase.

Each tier's job is to graduate the partner into the next. Quick Score is curiosity. Findings Reveal is fear (you saw 3 P0s, now you want to know which 3). Standard is professionalism (you're ready to show clients or a board the result, with HRC's name on it). Same methodology, scaled to the partner's decision threshold.

§ 05

Re-reviews and the score arc

Most code-review services give you a single number once and disappear. Our model is different: when you remediate, you can buy a Re-Review (50% of the original tier) and we do a fresh code read at the new commit, then mechanically compare against your prior findings.json. Each finding is verified at its file:line — closed, partial, still-open, or already-closed (recon correction). No prior scores or status are inherited.

That's what makes the score arc honest. When you go from C+ (66) to B+ (84) across three review cycles, the chart isn't HRC marking your homework — it's a fresh-read assessment at each anchor commit, calibrated by the same rubric. The partners who've gone through three cycles know the arc is real.

✓

The accountability commitment

Every score and badge is anchored to a specific commit SHA. The badge represents the codebase at that anchor— not a perpetual warranty. After 180 days the badge auto-renders with a "Score from [date]" caption. We don't vouch for changes we didn't read.

Apply this rubric to your code

Get a real grade.

Free Quick Score gives you a letter grade and your top 3 takeaways in 5 minutes. The full Standard Code Review applies this rubric with file:line citations and prioritized actions.

Quick Score

Free

Automated 10-category scan. Letter grade, top 3 takeaways, embeddable badge. 5 minutes.

Get my score →

Standard Code Review

$2,000 / project

Full HRC v2.0 review applied by a human engineer. Branded report, findings.json, badge, 30-min walkthrough. 5 business days.

Buy a Standard →

Field Notes Subscribers

Get notified when the next Field Notes volume ships.

No marketing emails. Just a single ping when a new volume goes live, with a one-line summary.