Also at Deasil Works · txn2 · Plexara
Profiles GitHub · X · LinkedIn
Theme Light · Auto · Dark
Professional notes by Craig Johnston
long-form, short-form, working drafts · since 2008
VOL. XIX · MMXXVI
82 NOTES IN PRINT
FOLIO LXXXII 22 APR 2026 · 16 MIN · LONG-FORM

The Pre-Commit Review Gate

AI on a Leash: Mechanical Self-Review for Claude Code

Diagram · folio lxxxii
flowchart TB
  A([Agent edits code]) --> B{{git commit}}
  B --> H[/PreToolUse hook/]
  H --> C{Trivial diff?<br/>Doc-only?<br/>Plan mode?}
  C -->|yes| OK([allow commit])
  C -->|no| D{Artifact valid?<br/>hash matches?<br/>verdict CLEAN?}
  D -->|yes| OK
  D -->|no| DENY[/deny + return<br/>review prompt/]
  DENY --> SUB([general-purpose<br/>sub-agent review])
  SUB --> F{Findings?}
  F -->|N findings| FIX([fix in working tree])
  FIX --> SUB
  F -->|CLEAN| ART[(write artifact:<br/>diff hash + verdict)]
  ART --> B

A pull request shouldn’t need four review rounds. But that’s what I kept getting from Claude Code: write code, run tests, claim done, ask for review, find real problems, fix, push, ask again, find more, fix, push, repeat. Across several PRs the pattern was identical. Tokens, time, and CI cycles burned on a loop the agent could have closed itself before the first commit.

§TL;DR

A PreToolUse hook on git commit blocks the commit until an adversarial sub-agent has reviewed the working-tree diff and produced a Verdict: CLEAN artifact tied to that exact diff. The gate is mechanical. The agent literally cannot commit without it. No more vibe-checks dressed up as self-review.

AI on a Leash Series | Previous: Ralph’s Uncle covers verification principles. Complete Go Project Configuration covers the test toolchain. This article covers the missing gate between “code passed local tests” and “code is committed.”

§The Cycle You Already Know

If you’ve worked with Claude Code on anything past a toy project, you’ve seen this:

  1. The agent writes code, runs make verify, and reports “all tests pass, the change is complete.”
  2. You ask for a critical review.
  3. The agent finds real issues that should have been caught the first time.
  4. The agent commits “fixes,” pushes.
  5. You ask for another critical review.
  6. The agent finds more real issues.
  7. Repeat.

Three or four iterations per PR is normal. The fixes are not invented problems. They’re real bugs: backend mismatches, doc-vs-code drift, tests that pass but don’t exercise the failure they claim to test, magic numbers that escaped a refactor. The agent has the capacity to find them. It just doesn’t, until you ask.

The framing I landed on after the third PR with this pattern: you can write code, and you can review code, but for some reason you have to be told to review the code you write.

That has to stop being true. The fix is not asking nicely. The fix is making “review your code” a precondition the runtime enforces, not a habit you hope the agent adopts.

§Why Self-Review Memory Entries Don’t Work

The first thing I tried was a memory entry: a checklist of common gaps the agent should look for before claiming done. Backend consistency, doc-vs-code drift, tests that match their claims, edge cases, error paths.

It didn’t work, and the failure mode is worth understanding because it’s the same failure mode every “be more careful” instruction hits.

The agent treats the checklist as a vibe-check. It reads the items, mentally answers “yes I considered that,” and moves on. There is no forcing function. Tests pass. The agent says done. You find things. The checklist might as well not exist.

A more procedural version (do this, then this, then this) helps a little but still gets skipped under any kind of time pressure or context bloat. Items get noted as “I should fix this” and then quietly buried as the conversation moves on.

The lesson: in-context discipline does not survive contact with the model’s own confidence. If you want the agent to do something every single time, you cannot put the requirement in the agent’s prompt. You have to put it in the runtime.

§Why Stop Hooks and PR-Open Hooks Are Too Late

The first runtime-level fix I considered was a Stop hook: at the end of every turn, run an automated review against the working tree.

Wrong trigger point. By the time Stop fires, the agent may have already committed. Claude Code’s commit workflow runs inside a single tool sequence that doesn’t always cross a Stop boundary. Gating at PR-open time has the same problem one layer up: the commit and the push are already done, and you’re paying the cost of every “fix to address review” commit thereafter.

The whole point of moving the review left is to avoid the round-trip cost of commit, push, review, fix-commit, push. If the gate fires after any of those steps, you’ve already bought the ticket.

The review needs to fire before the commit. Specifically, when the agent calls the Bash tool with a git commit command, the hook needs to intercept and decide whether to allow or deny it.

That is a PreToolUse hook. PreToolUse is the only place on the timeline where you can cheaply force the work to happen before any cost is sunk.

§The Architecture

Three coupled pieces. The first is the gate. The other two reinforce it.

§The Gate: A PreToolUse Hook on git commit

A shell script registered as a PreToolUse hook in ~/.claude/settings.json. When the agent calls Bash with a command matching git commit or git commit --amend, the hook intercepts and runs through this decision tree:

  1. Kill switch. If ~/.claude/.review-gate-disabled exists, allow. Used for debugging the gate itself.
  2. Plan mode. If .claude/.plan-mode exists in the project, allow. Plan mode is read-only by design.
  3. Trivial diff. If staged + unstaged changes total less than 20 lines, allow. Typo fixes pass through, otherwise the gate trains the agent to dismiss it as overhead.
  4. Doc-only diff. If no *.go|*.sql|*.sh|*.py|*.ts|*.tsx|*.js files changed, allow. Pure docs changes don’t need code review.
  5. Otherwise: compute a 16-character SHA-256 hash of git diff --cached HEAD; git diff and look for a matching review artifact at .claude/.last-review.md.

If the artifact is missing, stale, or doesn’t carry verdict: CLEAN, the hook denies the tool call. The denial returns hookSpecificOutput.permissionDecision: "deny" with the full sub-agent review prompt as the reason. The agent’s next turn fires with explicit instructions and zero excuse not to spawn the review.

§The Review: An Adversarial Sub-Agent

When the gate denies, the agent spawns a general-purpose sub-agent with a structured prompt that requires explicit output sections rather than freeform thoughts. The prompt forces the sub-agent to produce:

  • Numbered findings, each with file:line.
  • A paired-backend audit: every contract, every backend, every edge case.
  • A doc-vs-code walkthrough: every new doc paragraph traced against implementing code.
  • A test-vs-claim audit: every new test verified to actually exercise the failure mode it claims.
  • A verdict: CLEAN or N findings.

I picked general-purpose deliberately. The Plan agent is read-only and constrains fix suggestions. Custom code-reviewer plugins risk being changed out from under you. With general-purpose, the prompt controls behavior end to end, and the prompt is yours to version.

The agent loop is then strict:

  1. Read findings.
  2. Fix every finding in the working tree. No “follow-up” deferrals.
  3. Re-spawn the sub-agent on the new diff.
  4. Loop until Verdict: CLEAN.
  5. Cap at three rounds. If round three still has findings, the change is too large for one commit and must be split.

§The Artifact: A Hash That Pins the Verdict to a Diff

.claude/.last-review.md carries:

reviewed_hash: <16-char sha256 of git diff --cached HEAD; git diff>
findings_count: <int>
verdict: CLEAN

Any subsequent edit changes the diff hash. The next commit attempt re-blocks until a fresh review is performed and a new artifact is written.

The .claude/ directory is gitignored, so the artifact never lands in the repo.

§Files Installed

~/.claude/hooks/review-gate.sh                    # the hook script (executable)
~/.claude/hooks/review-prompt-template.md         # the adversarial prompt template
~/.claude/settings.json                           # PreToolUse + PostToolUse entries
~/.claude/projects/<project>/memory/feedback_self_review_checklist.md   # memory pointer

The hook script lives outside the project so it works across every repo on the machine. The prompt template is a separate file (not inline in the script) so it’s auditable and can evolve without touching hook logic. The memory entry is a short pointer to the gate, not a procedural checklist (the gate itself does that work).

§The Hook Script

The script supports two modes selected by its first argument:

  • precommit: PreToolUse hook for Bash. Denies git commit calls without a valid artifact.
  • size-warn: PostToolUse hook for Edit|Write|MultiEdit. Emits a systemMessage warning when the cumulative diff vs origin/main exceeds 800 lines. Soft signal that the change should be split.

The opening section captures the design choices that took the longest to get right:

#!/usr/bin/env bash
#
# review-gate.sh - pre-commit adversarial-review gate.
#
# Two modes (selected by $1):
#   precommit   PreToolUse hook for Bash. Blocks git commit when the
#               working tree has un-reviewed code changes.
#   size-warn   PostToolUse hook for Edit|Write|MultiEdit. Emits a
#               systemMessage warning if the cumulative diff vs
#               origin/main exceeds 800 lines.
#
# Fail-open: any internal error allows the action and emits a stderr
# warning. A buggy gate must never block legitimate work.

set -uo pipefail

MODE="${1:-precommit}"
HOOK_INPUT="$(cat)"

if [[ -f "$HOME/.claude/.review-gate-disabled" ]]; then
  exit 0
fi

PROJECT_DIR="$(printf '%s' "$HOOK_INPUT" | jq -r '.cwd // empty' 2>/dev/null)"
if [[ -z "$PROJECT_DIR" ]]; then
  PROJECT_DIR="${CLAUDE_PROJECT_DIR:-$PWD}"
fi
cd "$PROJECT_DIR" 2>/dev/null || exit 0

if [[ -f "$PROJECT_DIR/.claude/.plan-mode" ]]; then
  exit 0
fi

if ! git rev-parse --git-dir >/dev/null 2>&1; then
  exit 0
fi

A few invariants worth calling out:

  • Fail-open posture. Any internal error allows the action and prints a stderr warning. A buggy gate that blocks legitimate work is worse than no gate, because you’ll disable it permanently and never re-enable.
  • JSON output via jq -n --arg. When you build the deny response, the --arg form properly JSON-escapes multi-line strings. Trying to build the JSON with string concatenation will hand you a broken hook the first time the diff contains a quote or a newline.
  • Diff size threshold. Trivial-diff skip is essential. Without it, every typo-fix triggers a review and trains the agent to dismiss the gate as bureaucracy.
  • Doc-only skip. A pure docs change should not block on code review.
  • Project dir from the .cwd field. The hook input JSON contains cwd. Trust that over $CLAUDE_PROJECT_DIR or $PWD.
  • Watch the git commit-tree substring trap. Match git[[:space:]]+commit($|[[:space:]]), not just the substring commit. Otherwise you block unrelated subcommands.

The dispatch logic, helpers, and full hash computation come after the prelude. The full file is around 7.5 KB. Copy the canonical version verbatim when installing on a new machine, do not retype.

§The Prompt Template

The prompt template is the forcing function. The hook substitutes a {{DIFF}} placeholder with the materialized git diff --cached HEAD; git diff output (capped at 200 KB) and hands the result back as the deny reason. The next agent turn sees that prompt as input.

The structure that actually works:

  • Opens with framing: “Your job is to find every issue.”
  • Inlines the diff so the sub-agent doesn’t have to compute its own.
  • Constraints: numbered findings, file:line, no greeting, no positive notes. “Looks good” is explicitly listed as a failure mode.
  • An “if you have nothing” list that forces re-examination of the most common gaps: backend consistency, doc-vs-code drift, test-vs-claim, magic numbers, error paths, sink safety.
  • Required output sections. This is the forcing function. An agent can decline to think. It cannot silently decline to produce a section: the missing section is visible.
  • Confirmation-bias mitigation for re-reviews: prior findings should already be addressed, AND the new code must be scanned for issues introduced by the fixes themselves.

If you remove the required output sections, the gate quietly converts back into a vibe-check. The headings are not decoration. They are what prevents “I considered that” from passing as review.

§Settings Wiring

Add the following entries to ~/.claude/settings.json. They are additive. Existing Notification and Stop hooks remain unchanged.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": "$HOME/.claude/hooks/review-gate.sh precommit" }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit|Write|MultiEdit",
        "hooks": [
          { "type": "command", "command": "$HOME/.claude/hooks/review-gate.sh size-warn" }
        ]
      }
    ]
  }
}

§What the Agent Does When Blocked

The deny reason returned by the hook contains the full review prompt plus the materialized diff. The agent’s next turn fires with this as input. Required behavior:

  1. Read the deny reason. Do not write the review yourself. Spawn the sub-agent.
  2. Pass the prompt verbatim to a general-purpose Agent tool call.
  3. Read the agent’s findings.
  4. Fix every finding. No deferrals. A failed test counts as a finding to fix.
  5. Re-spawn the sub-agent against the new working-tree diff.
  6. Loop until Verdict: CLEAN.
  7. Write .claude/.last-review.md with the current diff hash and verdict: CLEAN.
  8. Re-attempt git commit. The gate now allows it.

If round three still has findings, do not iterate further. The change is too large for a single commit. git reset, re-stage smaller batches, and commit each piece separately through the gate.

§Verification: Proving the Gate Actually Works

Once the files are installed, walk through this end-to-end test. Skip it and you’ll discover the gate is misconfigured the first time you actually need it.

  1. Trivial change. One-line typo fix. Try to commit. Hook does not fire (diff is under 20 lines).
  2. Doc-only change. Add 50 lines to a .md file. Try to commit. Hook does not fire.
  3. Real code change, no artifact. Add 50 lines to a .go file. Try to commit. Hook denies with the full review prompt.
  4. Spawn the sub-agent. Fix any findings, write the artifact with the matching diff hash and verdict: CLEAN. Re-attempt commit. Gate allows.
  5. Edit again after artifact. Modify the same file. Try to commit. Hash mismatch causes the gate to re-deny.
  6. Kill switch. touch ~/.claude/.review-gate-disabled. Commit succeeds without review. Remove the flag and re-test.
  7. Plan mode. touch .claude/.plan-mode. Gate is skipped.

The hook script ships with a self-test that exercises a dozen scenarios via synthetic JSON input. Run it once at install time and again after any edit to the script.

§The Metric That Matters

One number to hold the agent to: PRs that needed more than one review round. Goal: zero.

Any commit message containing fix(...): address PR #N critical review (or any equivalent “review fix” pattern) on a feature branch is automatic evidence the gate was bypassed and needs hardening. Treat the appearance of such a commit as a process failure, not as evidence of careful work after the fact.

The 800-line soft size cap is the second metric. The PostToolUse systemMessage warning fires once cumulative diff crosses the threshold. A 1700-line diff is structurally hard to review, no matter how careful the gate is. The single biggest intervention is keeping changes small.

§What Did Not Work

Three approaches I tried before the pre-commit hook, in order, all failed.

In-context self-review checklist as a memory entry. The agent reads the checklist, claims to apply it, doesn’t actually walk it procedurally. Vibe-check failure mode.

Procedural checklist with explicit items. Better than freeform but still skipped under time pressure or context pressure. Items get noted as “I should fix this” and then quietly moved past as the conversation continues.

Stop hook (post-tool, end-of-turn). Wrong trigger point. By the time Stop fires, the commit may already have happened. The cost cycle has already started.

The pre-commit hook works because the agent literally cannot commit without it returning success. There is no introspection, no “I think I’ve done enough.” The gate is mechanical. That is the entire point.

§The Memory Entry That Supports the Gate

The memory entry is short. It does not duplicate the procedure. Its only job is to remind the agent the gate exists and what to do when blocked:

---
name: Pre-commit review gate (mechanical, externalized)
description: Adversarial review of every code change happens BEFORE commit, enforced by a hook.
type: feedback
---

The previous in-context "self-review checklist" was unreliable: the agent
treated it as a vibe-check rather than a procedure. Replacement: a
mechanical pre-commit gate at ~/.claude/hooks/review-gate.sh blocks
`git commit` until an adversarial sub-agent review of the working-tree
diff returns `Verdict: CLEAN` and the artifact at
`.claude/.last-review.md` carries the matching `reviewed_hash`.

When the hook fires:
1. Read the block reason for the full sub-agent prompt.
2. Spawn a `general-purpose` Agent with that prompt verbatim.
3. Fix every finding in the working tree. No "follow-up" deferrals.
4. Re-spawn until `Verdict: CLEAN`. Cap: 3 rounds; if round 3 still has
   findings, the change is too large and must be split.
5. Write the artifact and re-attempt `git commit`.

Source-of-truth files:
- ~/.claude/hooks/review-gate.sh
- ~/.claude/hooks/review-prompt-template.md
- ~/.claude/settings.json (PreToolUse + PostToolUse entries)

The gate does the enforcement. The memory entry just hands the agent a map of where the gate lives so it can comply efficiently when blocked, rather than burning a turn figuring out what hit it.

§Known Limitations

A few things this gate does not solve, by design or by current limitation:

  • The size cap is soft. The 800-line warning is a systemMessage, not a deny. The agent can ignore it. A future hardening would convert it to a deny on PreToolUse(Edit|Write) once the diff crosses some threshold (say 1500 lines), forcing a split. Right now the cap relies on the agent reading the warning and acting on it.
  • The diff hash is content-only. Whitespace changes, formatter runs, and reordering all change the hash. That is correct behavior (any change requires a new review), but it does mean re-running gofmt after a clean review invalidates the artifact and you have to re-review.
  • The gate doesn’t run in CI. It’s a developer-machine forcing function. CI still needs its own quality gates: lint, security scanning, tests. The gate complements CI, it doesn’t replace it.
  • The sub-agent inherits project memory. The same gentle reviewer instincts that fail in the parent context can leak into the sub-agent. The prompt’s adversarial framing and required output sections are the primary mitigation. Eternal vigilance against “looks good” responses is required.
  • The iteration cap and the size cap interact. A 1700-line PR may converge in four rounds rather than three. The cap should be enforced more strictly going forward, with the size cap doing more work earlier.

§Why This Belongs in the Series

The “AI on a Leash” thesis is that AI productivity is real but unreviewable at speed, and the only answer that holds up over time is a verification toolchain that catches what humans no longer have hours to catch. The Go articles cover the static and dynamic verification layers: linters, race detection, mutation testing, coverage. Those layers tell you whether the code is correct in the small.

The pre-commit gate is a different layer. It tells you whether the code is correct against the claims being made about it: the docstrings, the doc paragraphs, the test names, the commit message. Those claims are invisible to a linter and are exactly where AI fails most often. The agent writes a doc paragraph that subtly disagrees with the function it documents. The agent writes a test named TestHandlesEmptyInput that does not actually exercise the empty-input path. The lint suite is fine with both.

The pre-commit gate forces a structured re-read of every change against its own claims, before that change becomes a commit. Combined with the toolchain layers, the leash starts to feel like it has the right number of links.

§TL;DR for the Next Agent (or the Next Engineer)

  1. Install ~/.claude/hooks/review-gate.sh (executable) and ~/.claude/hooks/review-prompt-template.md. Copy them, do not retype.
  2. Add the PreToolUse(Bash) and PostToolUse(Edit|Write|MultiEdit) entries to ~/.claude/settings.json pointing at the script with precommit and size-warn arguments respectively.
  3. Add the memory entry pointing at the gate.
  4. When the gate denies a git commit, spawn a general-purpose sub-agent with the prompt the hook handed back, fix every finding in the working tree, loop until Verdict: CLEAN, write the artifact, retry the commit.
  5. If you see a fix(...): address PR review commit on a feature branch, that’s evidence the gate failed. Investigate and harden. Do not normalize the bypass.

The leash works only as long as it’s mechanical. The moment self-review becomes optional again, the cycle comes back.

← back to all notes