> Cage the agent. Ship the code.
You paste a task into Claude Code. Ten minutes later you have 400 lines across six files, three of them in folders that do not match your convention, two imports using relative paths you killed off last quarter, and a new util that duplicates one you already have. The code runs. The review hurts.
This is not a model problem. It is a constraints problem.
An agent will happily respect your architecture if you tell it the rules in a way the machine can verify. Telling it in a prompt is the weakest version of that. Telling it through tools that block bad code is the strongest. Most teams stop at the prompt and wonder why output drifts.
Here is the layered setup I run. It works with any agent (Claude Code, Codex, Cursor, Aider), and most of it is not language specific.
The layers
Think of it as a funnel. Each layer catches a different class of mistake before it reaches your branch.
| Layer | What it does |
|---|---|
| Type system | Shapes and contracts. |
| Linter | Patterns and complexity. |
| Formatter | Style. |
| Pre commit hooks | The gate before history. |
| Specs and codegen | Structure the agent cannot invent around. |
| Feedback loop | The agent fixes its own output before you see it. |
A prompt rule says "please do X". These layers say "you literally cannot commit if you do not do X". Different category of enforcement.
Layer 1: the type system is a contract
Strict types are the cheapest guardrail you will ever add. They pay for themselves in the first wrong call the agent makes.
TypeScript, tsconfig.json:
{
"compilerOptions": {
"strict": true,
"noImplicitAny": true,
"noUncheckedIndexedAccess": true,
"exactOptionalPropertyTypes": true,
"baseUrl": ".",
"paths": {
"@components/*": ["src/components/*"],
"@lib/*": ["src/lib/*"]
}
}
}Python, pyproject.toml with mypy:
[tool.mypy]
strict = true
disallow_untyped_defs = true
disallow_any_generics = true
warn_unused_ignores = trueGo gives you this for free. Rust gives you more than you asked for. Same idea either way: the compiler rejects sloppy output so the agent has to tighten it.
Path aliases matter more than people think. If the agent sees @lib/db once in an example, it stops writing ../../../lib/db. One config line, hundreds of cleaner imports.
Layer 2: the linter enforces patterns
A type system catches shape errors. A linter catches pattern errors. This is where you encode the architectural rules the agent keeps forgetting.
ESLint, the rules that actually move the needle for AI output:
export default {
rules: {
"max-lines-per-function": ["error", 60],
"max-lines": ["error", 250],
"complexity": ["error", 10],
"no-restricted-imports": ["error", {
"patterns": [{
"group": ["../*"],
"message": "Use @lib, @components, @hooks aliases."
}]
}],
"react/function-component-definition": ["error", {
"namedComponents": "arrow-function"
}]
}
};Python, ruff.toml:
line-length = 100
select = ["E", "F", "I", "N", "UP", "B", "SIM", "C90"]
[mccabe]
max-complexity = 10
[pylint]
max-args = 5
max-statements = 40Go, .golangci.yml:
linters:
enable:
- gocyclo
- funlen
- gocognit
- revive
- errcheck
linters-settings:
funlen:
lines: 80
statements: 40
gocyclo:
min-complexity: 10Notice the shape. Complexity cap, length cap, import rules. These three alone stop the "megafunction that does everything" output that agents love to produce when they get nervous.
Layer 3: formatter, non negotiable
Prettier, gofmt, rustfmt, ruff format, black. Pick one per language and wire it up. The agent stops spending tokens on indentation and quote style, you stop reading style noise in diffs. There is no debate here.
npx prettier --write .
cargo fmt
gofmt -w .
ruff format .Layer 4: pre commit hooks, the gate
This is where most teams leave money on the table. You have a linter and types. You trust humans (and agents) to run them. They do not.
Husky for JavaScript and TypeScript projects:
npm install -D husky lint-staged
npx husky initThen .husky/pre-commit:
npx lint-staged
npm run typecheckAnd package.json:
{
"lint-staged": {
"*.{ts,tsx}": ["eslint --max-warnings=0", "prettier --write"],
"*.{js,jsx,json,md}": ["prettier --write"]
},
"scripts": {
"typecheck": "tsc --noEmit"
}
}Now the agent cannot land a commit that fails typecheck or lint. Not "should not". Cannot.
For polyglot repos or non JS stacks, use the pre-commit framework. It is a Python tool but it runs hooks for any language.
.pre-commit-config.yaml:
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.0
hooks:
- id: ruff
args: [--fix]
- id: ruff-format
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.10.0
hooks:
- id: mypy
args: [--strict]
- repo: https://github.com/golangci/golangci-lint
rev: v1.59.0
hooks:
- id: golangci-lint
- repo: local
hooks:
- id: cargo-clippy
name: cargo clippy
entry: cargo clippy -- -D warnings
language: system
types: [rust]
pass_filenames: falseInstall once with pre-commit install. The hook runs on every commit across every language in the repo.
Add a commit-msg hook while you are here. Agents write commit messages too, and they drift toward "update stuff" if you let them.
# .husky/commit-msg
npx --no -- commitlint --edit $1Or a two line shell check for conventional commits:
# .husky/commit-msg
pattern='^(feat|fix|chore|docs|refactor|test|perf)(\(.+\))?: .{1,72}$'
grep -qE "$pattern" "$1" || {
echo "Commit must be conventional (feat:, fix:, chore:, ...)."
exit 1
}A pre-push hook is where the expensive checks go. Unit tests, build, contract tests. Slower, but it runs way less often.
# .husky/pre-push
npm run build
npm testSide effect the agent will notice: when a hook fails, it gets the error output and fixes the code on the next turn. You are not just blocking bad commits, you are teaching the agent what "done" looks like.
Layer 5: specs, the structure the agent cannot invent around
This is the one most teams skip, and it is the one that pays the most.
Write the spec first. Generate code from it. The agent fills in behavior, not structure.
API contracts, OpenAPI:
# openapi.yaml
paths:
/users/{id}:
get:
operationId: getUser
parameters:
- name: id
in: path
required: true
schema: { type: string, format: uuid }
responses:
"200":
content:
application/json:
schema: { $ref: "#/components/schemas/User" }
components:
schemas:
User:
type: object
required: [id, email, createdAt]
properties:
id: { type: string, format: uuid }
email: { type: string, format: email }
createdAt: { type: string, format: date-time }Then generate:
# TypeScript client + types
npx openapi-typescript openapi.yaml -o src/api/schema.ts
# Python server stubs
openapi-generator-cli generate -i openapi.yaml -g python-fastapi -o server/
# Go client
oapi-codegen -package api openapi.yaml > api/client.goNow the agent writes the handler body. It does not invent the URL, the status codes, the payload shape, or the field names. Those came from the spec.
Data contracts, JSON Schema or Protobuf:
// order.proto
message Order {
string id = 1;
string customer_id = 2;
repeated LineItem items = 3;
Money total = 4;
OrderStatus status = 5;
}Run protoc or buf generate, get types in every language you ship. The agent cannot rename a field and get away with it, the generator will fight back.
Runtime validation at the edges, even in typed languages:
import { z } from "zod";
const CreateOrder = z.object({
customerId: z.string().uuid(),
items: z.array(z.object({
sku: z.string(),
qty: z.number().int().positive(),
})).min(1),
});
export async function POST(req: Request) {
const body = CreateOrder.parse(await req.json()); // throws on drift
// ...
}Python with pydantic, Go with go-playground/validator, Rust with validator or serde plus assertions. Pattern is the same. The schema is source of truth, the handler is a thin wrapper, the agent cannot get creative with shapes.
Frontend component contracts, props schemas:
type ButtonProps = {
variant: "primary" | "secondary" | "ghost";
size: "sm" | "md" | "lg";
disabled?: boolean;
onClick: () => void;
children: React.ReactNode;
};
export const Button = (props: ButtonProps) => {
// ...
};Closed unions, not strings. The agent cannot pass variant="fancy" and have it compile.
Layer 6: the feedback loop
This is the multiplier.
Most agent harnesses already run typecheck and tests, but the loop only works if the signals are loud and fast. Three rules:
- Fail fast.
tsc --incremental,cargo check,mypy --install-typescached. Anything over 20 seconds and the agent stops iterating. - Fail specifically. One error with a file and line beats a summary. Agents read errors well when the shape is consistent.
- Fail early. Lint before test, typecheck before lint, format before all of it. Cheapest check first.
A Makefile or justfile as the single entry point:
.PHONY: check
check: fmt lint typecheck test
fmt:
prettier --check .
cargo fmt -- --check
ruff format --check .
lint:
eslint .
cargo clippy -- -D warnings
ruff check .
typecheck:
tsc --noEmit
mypy src
test:
npm test -s
cargo test --quiet
pytest -qNow make check is the contract. The agent runs it, reads failures, fixes, repeats. You never had to describe "what clean code means in this repo". The tools describe it.
Putting it together: what happens on a real task
You tell Claude Code "add a /orders/:id/cancel endpoint".
- It reads
openapi.yamland finds nocancelop. It adds one, with the right response shapes, because the existing ones showed the pattern. - It runs
npx openapi-typescriptand the types regenerate. - It writes the handler.
tsc --noEmitfails once because it returned the wrong discriminated union variant. It fixes it. - It writes a test.
eslintrejects the file for exceeding complexity, it splits into two helpers. - It commits.
lint-stagedformats. Husky runstypecheck. Commitlint rejects the message, it rewrites tofeat(orders): add cancel endpoint. Commit lands. - It pushes. Pre-push runs the full test suite. Green.
At no step did you tell the agent "follow our patterns". The tools did.
Honest tradeoffs
What you gain
- Drift stops. The first failing commit teaches the agent the rule.
- Review is about intent, not style. Everything mechanical is already enforced.
- New contributors (human or AI) onboard via
make checkinstead of a 40 page wiki. - The spec becomes the primary artifact. Code is downstream.
What it costs
- Up front setup. A day or two to wire linters, hooks, specs, and codegen.
- Slower first commit on a new repo. Hooks run, generators run, types build.
- Every rule you add is a rule you have to maintain. Do not over constrain, the agent will thrash against bad rules.
- Spec first development is a mindset shift. Some teammates will fight it.
The rule I have landed on
If a mistake is mechanical, a tool should catch it. If a tool can catch it, a hook should block it. If a hook blocks it, the agent will learn it.
Prompts are the weakest constraint. Types, linters, hooks, and specs are the strong ones. Stack them, and the agent starts looking a lot more like a disciplined junior and a lot less like a clever intern with commit access.
The model is not the bottleneck. Your guardrails are.