Every HRC Code Review uses the same 10-category rubric.
Most code reviews are vibes. We score yours against a fixed rubric — ten categories, stable weights, the same letter-grade ladder for every customer. That's what makes the score comparable across reviews and across re-reviews. A B+ from us in May 2026 means the same thing as a B+ from us in May 2027, and the same thing as a B+ on another partner's repo.
Reports score against a single lens: production readiness. The score reflects the gap between your codebase as it stands and a deployable customer-facing service. A working proof-of-concept that validates a product idea may score in the D or F range under this lens, and that's fine — it doesn't invalidate the work. The score is a roadmap toward production, not a verdict on the work to date.
The ten categories
Weights are stable across reviews. Security carries the most weight (18%) because security gaps are existential. Reliability and Functional Correctness are next at 15% each — they're the categories where the code lies to you about whether it works. Maintainability is 12% because compounding tech debt is slow death.
| # | Category | Weight | What it measures |
|---|---|---|---|
| 01 | Functional Correctness | 15% | Does the application do what it claims? Edge cases? Reproducible defects? |
| 02 | Architecture & Structure | 10% | Module boundaries, dead-code surface, stack-vs-problem fit. |
| 03 | Security | 18% | AuthN/Z, input validation, secrets, supply chain, application-layer hardening. |
| 04 | Reliability & Error Handling | 15% | Tests, failure modes, observability, graceful degradation. |
| 05 | Maintainability & Code Quality | 12% | Readability, duplication, file size, documentation, convention adherence. |
| 06 | Operational Readiness | 8% | Deployability, configuration, logging, monitoring, health checks, hand-off readiness. |
| 07 | Portability & Vendor Lock-in | 7% | Will it run anywhere except where it was built? |
| 08 | Dependency Hygiene | 5% | Pinning, drift, license compatibility, supply-chain controls. |
| 09 | Performance & Cost | 5% | Efficiency, AI-cost awareness, scaling. |
| 10 | IP & Ownership Clarity | 5% | License documentation, copyright attribution, authorship clarity, AI-provenance. |
The overall score is the weighted average: overall = Σ (category_score × category_weight). Each category is scored 0–100 against the rubric; the category score is the lowest-passing tier — a single critical gap caps the category regardless of strengths elsewhere.
What an A looks like
For each category, the A-tier criteria are what you're aiming at if you want a 90+ score. Most codebases don't need every category at A — but knowing the bar lets you choose where to invest.
Functional Correctness
15%A: Behavior matches the stated spec across happy path and edge cases. Comprehensive automated test coverage. Schema validation at every input boundary.
Architecture & Structure
10%A: Clear single-purpose modules. No dead code. Coherent stack choice. Testable boundaries throughout. Frameworks used appropriately, not as glue.
Security
18%A: Authentication and authorization on every protected route. Input validation at all boundaries. Secrets in a vault, not in code. No known CVE surface. Security headers, rate limiting, tested.
Reliability & Error Handling
15%A: Comprehensive automated tests (smoke + integration + critical paths). Graceful degradation under partial failure. Structured logging. Error monitoring integrated and actually monitored.
Maintainability & Code Quality
12%A: Clean code, no notable duplication. Consistent naming. Small focused modules. Documented public APIs. README and ARCHITECTURE.md present. Conventions are load-bearing — CI checks or lint rules enforce them.
Operational Readiness
8%A: Dockerized. Structured logging, metrics, health + readiness probes. Runbook documented. Configuration via env vars. A new engineer or vendor can productively own this within a week.
Portability & Vendor Lock-in
7%A: Standard tooling. Deploys to any container platform. Vendor SDKs abstracted behind interfaces so swapping providers is a configuration change, not a rewrite.
Dependency Hygiene
5%A: Single source of truth per language. Pinned via lockfile. Supply-chain controls (release-age guards, allowlists). License-clean. No known CVEs.
Performance & Cost
5%A: Profiled at expected scale. Appropriate caching. Cost-aware (batched, cached, capped — especially for AI provider calls). No obvious bottlenecks.
IP & Ownership Clarity
5%A: LICENSE file present with full text matching all manifest declarations. CONTRIBUTING.md present. Copyright headers in source files. Clear author attribution. If AI tools generated any code, the license posture and attribution are explicit.
Letter grades
The overall score maps to a letter grade. The grade is what partners typically share — it's the artifact on the public badge, on the README, in the pitch deck. The underlying number preserves the precision the categories produced.
| Range | Grade | Meaning |
|---|---|---|
| 90–100 | A | Production exemplar |
| 85–89 | A− | Production-ready with minor cleanup |
| 80–84 | B+ | Production-ready after focused remediation |
| 75–79 | B | Functional; significant focused work before production |
| 70–74 | B− | Functional but multiple gaps |
| 65–69 | C+ | Solid prototype; broad remediation required |
| 60–64 | C | Working prototype; major gaps before production |
| 55–59 | C− | Proof-of-concept; substantial rework required |
| 50–54 | D+ | Early prototype |
| 40–49 | D | Not production-viable as-is |
| <40 | F | Fundamental rework required |
Severity levels
Every finding in your report is tagged with one of four severity levels. Every P0 and P1 finding carries both an effort estimate (engineer-hours to close) AND a cost-of-inaction (what gets worse if you defer it). Effort tells you what fixing costs; cost-of-inaction tells you what delaying costs. Both are required for honest prioritization.
Blocking. Must be addressed before any production exposure. Examples: no authentication on a sensitive route; production secret hardcoded in source; SQL injection on a user-input path; default admin credential active; missing CSRF on state-changing endpoints; auth bypass via parameter manipulation.
High. Required before extending the application or onboarding additional users. Examples: rate limiting absent on a costly endpoint; RBAC scope bypass via specific parameter shape; dependency with a known CVE patched in the next major version; missing transaction handling on financial operations; sensitive data logged in plaintext.
Medium. Quality issues to address opportunistically; do not block the next release.
Informational. Nice-to-have refinements.
Three tiers of HRC engagement
The same v2.0 methodology powers three product tiers, each scaled to a different stage in the partner's decision-making. Lower tiers are glimpses of the full review, not substitutes for it.
Quick Score
Letter grade, 10-category breakdown, severity counts, and 3 headline finding titles. Automated, free, email-gated. The entry point.
Findings Reveal
Your top 15 P0/P1 findings with file:line citations and recommended fixes. Branded HRC HTML report. 30-day refund guarantee if any finding is verifiably false.
Standard Code Review
Every finding across all 10 categories. Per-category analysis. Decisions Required, Open Questions, bug catalog. Human-verified. Public score badge. 30-min walkthrough call. The full HRC stamp on your codebase.
Each tier's job is to graduate the partner into the next. Quick Score is curiosity. Findings Reveal is fear (you saw 3 P0s, now you want to know which 3). Standard is professionalism (you're ready to show clients or a board the result, with HRC's name on it). Same methodology, scaled to the partner's decision threshold.
Re-reviews and the score arc
Most code-review services give you a single number once and disappear. Our model is different: when you remediate, you can buy a Re-Review (50% of the original tier) and we do a fresh code read at the new commit, then mechanically compare against your prior findings.json. Each finding is verified at its file:line — closed, partial, still-open, or already-closed (recon correction). No prior scores or status are inherited.
That's what makes the score arc honest. When you go from C+ (66) to B+ (84) across three review cycles, the chart isn't HRC marking your homework — it's a fresh-read assessment at each anchor commit, calibrated by the same rubric. The partners who've gone through three cycles know the arc is real.
Every score and badge is anchored to a specific commit SHA. The badge represents the codebase at that anchor— not a perpetual warranty. After 180 days the badge auto-renders with a "Score from [date]" caption. We don't vouch for changes we didn't read.
Get a real grade.
Free Quick Score gives you a letter grade and your top 3 takeaways in 5 minutes. The full Standard Code Review applies this rubric with file:line citations and prioritized actions.
Automated 10-category scan. Letter grade, top 3 takeaways, embeddable badge. 5 minutes.
Get my score →Full HRC v2.0 review applied by a human engineer. Branded report, findings.json, badge, 30-min walkthrough. 5 business days.
Buy a Standard →Get notified when the next Field Notes volume ships.
No marketing emails. Just a single ping when a new volume goes live, with a one-line summary.