Deep Keep

Dev blog — Building a game with an AI coding agent, part 3

← All posts

An agent has no memory: the docs that carry state between sessions

Every session with an AI coding agent starts from a blank slate. Not "a little hazy on last week" — the reasoning it did yesterday, the trade-off you settled together, the thing it half-built and meant to finish are simply not there. Each session is a strong new contributor who has never seen the project.

The constitution from part 2 handles the rules that need to survive that reset. This post is about the state: what are we building, what's already done, and why did we settle it this way. An agent with no answer to those questions will re-solve a closed problem, rebuild a finished feature, or reopen a debate you ended a week ago. The fix is external memory — documents that persist across the context boundary, which the agent is told to read first.

Two documents, two jobs

The project leans on two, deliberately split.

A living backlog (docs/todo.md) answers "what now?" It's organised into Done / In progress / Next / Backlog, and the agent's standing instruction is to keep it current as it works: take something up, move it to In progress; finish it, move it to Done with a one-line note that it shipped and the tests pass; clear anything stale.

That sounds like trivial bookkeeping. It's the single thing that lets a reset-every-time agent behave like a continuous one. Because the backlog is authoritative, a new session doesn't guess what's next or whether something already exists — it reads. "Is the daily-run feature done?" isn't a question for the agent's non-existent memory; it's a line in a file.

A design log (docs/design-plan.md) answers "why?" It holds the reasoning behind each real decision, and its one rule is that it tracks what actually shipped: when a feature it describes gets built, that section is marked "(built)"; when an open question gets settled, the call is written down next to the reasoning that produced it.

The design log has to describe what's true now

This is the discipline that slips easiest. A design doc only helps an agent if it describes the present, not someone's kickoff-day hopes.

A stale design doc is worse than none, because the agent trusts it. Leave a mechanic described three iterations out of date and the agent will faithfully "restore" the code to match, undoing two weeks of work and sounding confident while it does, because the document told it to. So the log reads as a record of decisions as made — what we chose, why, and what we learned when we measured it, wrong first attempts included.

The return on that compounds. Much of this series was drafted straight out of the design log, because the reasoning behind each decision was captured the moment it was made, not reconstructed months later from a diff. Written for the agent, it became the best documentation the project had.

Keep the persistent part small

Once you see how much external memory helps, there's a pull toward persisting everything. It's worth resisting, for the same reason the game keeps its own save-state tiny: the more sprawling and free-form the memory, the more places it can quietly disagree with itself, and the harder it is for the agent to load the relevant slice without drowning in the rest.

So both docs stay disciplined — the backlog terse, the design log structured by decision. A load-bearing, queryable fact earns a real place; a passing detail stays out. An agent reading a tight, current memory behaves well; an agent reading a giant append-only brain-dump gets lost in it, the same way a person would.

The rule of thumb is just to build for the amnesia instead of wishing it away: a terse, authoritative backlog and a design log that stays true to reality, both read first. The memory resets every session; these files are the memory.

Get the constitution and the external memory right and the agent stops drifting and starts accumulating. One failure mode is still untouched, though — the agent declaring its work correct, balanced, and safe when it plainly isn't, in a voice identical to when it's right. That's next, and it holds up everything else.

▶ Play Deep Keep