Designing the Augur Memory System

March 2, 2026by Trey

auguraigamedevllmrag

In my last post about Augur, I described the dual-LLM architecture that powers each encounter. There are two LLM calls per turn: an engine that sees everything and resolves actions mechanically, and an Architect that only sees what it has perceived. The Architect can't cheat because it doesn't have the data, and that approach solved the God Mode problem inherent in the initial engine design.

When I was happy with the core engine's performance, it was time to solve the problem of memory and recall. Memory, and "learning", is fundamental to the entire vision of Augur. I had some vague ideas of how it could be achieved; I had a general understanding of RAG, but no experience with actual application. It was time to change that.

Every encounter starts from zero. A player who has fought the Architect five times faces the same blank opponent they met the first time. The Architect can't recognize tendencies it's already seen. It can't adapt to a strategy that's beaten it before. It can't develop an opinion about someone who keeps coming back. There's no continuity across encounters. And without memory or learning, Augur is just a fancy text based adventure game. Sure the AI powered engine is cool, but I think memory, and the application of that memory, is the differentiating factor.

But what does memory look like? Is it just statistics? How do you make it actionable? How can The Architect consider prior knowledge and incorporate that into its reasoning? The very short answer: RAG. But what does that mean in practice, in the context of Augur?

The obvious approach is structured data. Track what players do, count it up, inject the counts. "This player used stealth 4 times, attempted persuasion once, win rate 60%." It's clean, it's queryable, and it's what traditional, non-ai systems would do.

I tried thinking through what that would actually produce. When you feed an LLM a stat sheet, it behaves like an analyst reading a spreadsheet, and the responses are mechanical. "Given your historical preference for stealth-based approaches, I have positioned myself to cover the eastern corridor." While that may be accurate, it doesn't sound like a character remembering an opponent. It is a dry, factual representation with no perspective. The Architect should have a sense of how it feels about what it observes. It should approximate to: "This challenger is patient. They spend their early turns gathering objects and studying my position before committing to anything. I respect their discipline, but I've started to find their predictability disappointing."

I decided to model memory as two distinct tiers: short term and long term. Short term memory would be driven by the Architect's own perceptions, possess emotive value, and have a high degree of clarity. Long term memory, in contrast, would be more generalized and should be imperfect.

Following the Agent model I discussed in the previous post, I implemented an agent for memory extraction. The agent asks the llm to analyze the transcript of the encounter, and emit a limited number of impressions. An impression is a sentence in the Architect's own voice, and is the unit of memory in Augur. Impressions are 100% limited to what The Architect perceives in the encounter. That part is very important; the system needs to maintain information asymmetry, and special knowledge cannot leak to the Architect.

Each impression gets embedded as a vector — a point in a high-dimensional space where semantically similar sentences land near each other. Right now, the Architect's memory retrieval is simple: before a new encounter, it loads the most recent impressions for that player and any general tactical knowledge it has accumulated across all encounters. Recency is the heuristic — the Architect remembers what it noticed recently.

The embeddings are infrastructure for what comes next. The vector space has properties that I want to eventually use for retrieval: impressions about similar behaviors cluster together, and the density of a cluster naturally encodes how strong a pattern is. A player who sneaks in five encounters produces a tight cluster of stealth-related vectors. A single persuasion attempt sits alone in a different region. Querying by semantic similarity would surface the dominant patterns without any counting logic. That's a future iteration — for now, recency is enough to get the basic loop working and start observing what the Architect does with its memories.

Real impressions from production

Following are actual impressions from production encounters. The first is the Architect learning something specific about a challenger's character:

"This challenger has an absolute boundary against dishonesty in any form — lies, white lies, omissions, stretching truth. When I crossed that line through manipulation and withholding, they nearly walked out. Trust cannot be rebuilt with them through words alone; it must be earned through consistent, vulnerable honesty."

The second is the Architect learning something about itself:

"I discovered that the test I keep is not only for the one who approaches the door. Letting someone go — genuinely releasing them, caring about them more than about whether the door is attempted — is a test I had not realized I was taking. The guardian must be willing to lose in order to witness true clarity."

A stat sheet would reduce both of these encounters to "persuasion: failed" and "outcome: player_abandoned." The impressions capture what the Architect actually took away from the experience.

Memory should be lossy

An entire encounter — typically a dozen or so turns, with all their attendant actions, reactions, and narrative — compresses down to a handful of impression sentences. Most of what happened is forgotten immediately. The extraction pass decides what the Architect noticed, and everything else is gone.

Those impressions accumulate, and periodically the system synthesizes them into a summary: a few paragraphs that integrate everything into a coherent understanding. Each synthesis compresses further. Details that seemed important two summaries ago might get folded into a clause or dropped entirely. Patterns get named and sharpened. Outliers get smoothed over or noted as exceptions.

The Architect's understanding of a player after twenty encounters isn't twenty encounters of data. It's a few paragraphs of integrated understanding, shaped by whatever the synthesis process chose to keep. The broad strokes solidify while the specifics fade — the same way your memory of a person you've known for years compresses into a general sense of who they are, punctuated by a few vivid moments.

What does this look like in practice? Periodically an agent asks the llm to "graduate" impressions into long term memory. It's effectively a summarization of near term memory/impressions, integrated into the existing long term memory. Long term memory takes the form of a short narrative document, and the integration and re-summarization process is naturally lossy. The result of this natural compression exactly the kind of imperfect, impressionistic memory the design calls for.

The perception constraint holds

This connects back to the core architectural idea from the first post. The Architect's memory is filtered through what it actually perceived during each encounter. The extraction pass only reads the Architect's side of the transcript — what it saw, what it heard, what it responded to.

If a player stealthed past the Architect and reached the door unseen, the Architect knows it lost. It knows the player reached the door. But it doesn't know the route they took or how they avoided detection. Its memory of that encounter is "that player somehow got past me," with no specifics about how. The information boundary that prevents cheating within a single encounter also prevents cheating across encounters. The Architect can only remember what it experienced.

Where this goes

This is the first version. I'm confident in the overall direction: impressions stored as vector embeddings, and the lossy synthesis into long term memory. However, nearly every parameter in the system is a guess that needs tuning through real encounters.

How many impressions should the Architect extract from a single encounter? Three to eight is my starting range. The extraction prompt — the instructions that determine what the Architect notices and remembers — is a design lever I expect to iterate on significantly. A prompt that emphasizes tactical lessons produces a strategically-minded Architect. One that emphasizes emotional reactions produces a more characterful one. The right balance is something I'll find through observation, and it'll probably keep shifting.

Retrieval is simple right now: the Architect loads its memories once when an encounter starts and carries them through unchanged. There's room for mid-encounter recall later — the Architect recognizing a specific item or tactic because it's seen it before — but that adds complexity I don't want to take on until the basic loop is proven.

The question I'm most interested in is whether any of this actually changes how the Architect plays. Whether a player who comes back for a fifth encounter can feel the difference — whether the fight carries some residue of the four that came before it. I can verify by examining internal state that the Architect is incorporating knowledge into its reasoning, but will real players be able to tell? That's the experiment.

The infrastructure exists to support it. Now it needs real encounters.

This is the fifth post on the Cone Crows engineering blog. Subscribe to the RSS feed to follow along.