Trust, but verify: making an AI check its own work

1 July 2026

One failure mode will sink an AI-built project faster than any bug: the agent tells you the work is good, and it sounds every bit as convincing when it's wrong as when it's right.

Ask an agent whether a change is balanced and it will tell you, fluently, that the difficulty curve is intact — with no privileged access to whether that's true, because it's producing what a correct answer sounds like. If your quality bar is "the agent says it's fine," you don't have one. This is the practice the whole project rests on, so it gets the longest post.

Build gates, don't trust judgement

What made building this way hold together was to stop trusting anyone's judgement about correctness — mine or the agent's — and build gates instead: automated checks that pass or fail with no room for a persuasive story in between.

Three tiers ran on this project, cheapest first.

A typecheck and a test suite on every change — table stakes, with one house rule that matters more than usual: every engine or generator change ships with a test beside it. An agent that has to write the test can't hand-wave the behaviour, because the test is the behaviour, pinned down. It also leaves a tripwire for the next session that has no memory: break the old behaviour and something goes red.

A solver used as a correctness gate. The game's core promise is that the generator can never strand a player behind an impossible encounter. That isn't checked by feel — there's an exhaustive solver that, given an encounter and a build, decides whether it's winnable, and no mandatory content ships without passing it. The promise stopped being something the agent asserted and became a function that returns true or false.

Balance simulations as ship-gates. This is the tier I most came to depend on. For the fuzzy properties — is this build viable at depth, does this new item flatten the curve, are the four playstyles still roughly at parity — a harness (analyze.ts) plays thousands of simulated runs and measures the answer. New systems don't ship until their simulation passes. Balance became a number the machine reports rather than a claim the agent makes.

Why this suits an AI collaborator

There's a useful symmetry here. The thing that makes an LLM risky — confident output with no grounding — is defused by the thing an LLM is genuinely good at: producing a lot of code, test and simulation code included.

So you point the agent's fluency at its own unreliability. It's excellent at writing the simulation that would catch its own bad balance change, and at writing the test that pins the behaviour it might later break. You spend its capability on the gates, and then the gates decide what's true. A collaborator you can't trust to judge the work is completely trustworthy at building the thing that judges the work.

The gates earned it

None of this is hygiene theatre; the gates overturned real decisions throughout. A balance simulation caught a defensive stat that, pushed far enough, quietly outran the entire difficulty curve — one dominant build that would have hollowed out the game's variety, invisible in playtesting and to the agent's confidence alike. A gear simulation caught a trinket that read as fine and was secretly a curve-flattener. Each time, the measurement said no in a voice the confident prose couldn't argue with. (The next post is all receipts.) That's the mark of a gate worth having: it tells you no when everyone, human and agent, believed the answer was yes.

The determinism dividend

One design choice multiplied the value of every gate: the game is deterministic — same inputs, same outputs, every time. That's a game-design property, but it pays off just as much in development, because determinism is what makes a system checkable at all. You can only write an exhaustive solver for a fight if the fight is a closed, knowable system; you can only replay a run to verify it if it reproduces exactly. Non-determinism wouldn't just make bugs flaky — it would make the gates themselves untrustworthy, and for an AI-built project the gates are the thing you were trusting.

If there's a single practice to carry out of this series, it's this one. An AI will tell you its work is correct, balanced, and safe in a voice you can't distinguish from the truth. Build gates that answer those questions independently — tests, a solver, simulations — and let the gates decide; spend the agent's fluency on building them. Get that right and the agent's overconfidence stops being a liability, because nothing it merely claims reaches the main branch unchecked. Which sets up the next post nicely: the times it was confidently, memorably wrong, and which gate caught each.

▶ Play Deep Keep