devAlice
← Alice Way

3. Memory system — where and how dynamic context accretes

Not RAG, not a vector DB — memory built from a handful of markdown files. Four memory types, the index strategy, auto-load vs on-demand, and the pruning discipline that keeps it alive.

This is part 3 of the Alice Way series. In part 2 Persona design, I split the persona ("who this collaborator is") from memory ("what they currently know to be true"). This post is the record of how I built that memory.

I started thinking about elaborate things. Vector DB, embeddings, RAG pipelines. Killed all of it within a few days. The amount I actually accrete in a day is not that large. What I needed was a place that was non-volatile, human-readable, and auto-loadable by the tool. The answer was the file system.

0. Memory is an extension of mind

If a persona is "who this collaborator is," memory is "what this collaborator currently has in mind."

A context that evaporates forces the operator to pay the cost of rebuilding the same mind every day. What was decided yesterday, why it was decided, what did not work — if all of that is gone at the start of the next session, the operator is rebuilding yesterday's portion of their mind every morning.

Memory is not RAG, not a vector DB. It is the simplest possible device for keeping the operator's decisions, feedback, and context from being lost by the next session.

From §1 onward, this post covers how that device was built — out of a handful of markdown files and a single index.

1. The simple decision — markdown files

What I converged on:

memory/
├── MEMORY.md              # index (auto-loaded)
├── user_role.md           # user type
├── feedback_testing.md    # feedback type
├── project_q2_release.md  # project type
├── reference_grafana.md   # reference type
└── ...
  • One memory = one file
  • A single index (MEMORY.md) with one-line pointers to every file
  • Every file is markdown, directly human-readable
  • The tool auto-loads the first N lines of the index at session start

That is the whole thing. No fancy infrastructure. Managed in git, edited in a text editor, searched with grep.

The context you actually accrete every day is surprisingly small. Discipline matters more than infrastructure.

2. Four memory types

Not all information is treated as one type. I split into four.

2.1 user — facts about the operator

Who the operator is, what role they play, what areas they know.

---
name: user-role-and-expertise
description: Operator's role · expertise · platforms
metadata:
  type: user
---
 
Operator is a senior engineer, primary languages X·Y·Z, mostly works on A·B.
Frontend is recent — explain frontend concepts via backend analogies.

The biggest payoff of this type is that responses get tuned to the operator's level and domain. The student tone goes away, and things the operator already knows stop getting re-explained.

2.2 feedback — explicit guidance

Explicit guidance from the operator. Both "don't do this" and "keep doing this."

---
name: integration-test-policy
description: Integration tests must run against a real DB — no mocks
metadata:
  type: feedback
---
 
Run integration tests against the real DB, not a mocked one.
 
**Why:** Last quarter the mocked test passed but the prod migration broke.
**How to apply:** Every test under integration/. unit/ may still mock.

For the feedback type, always write Why and How to apply alongside the rule. Rules without a reason fail at the edges. With the reason, you can carry the same intent into new situations.

Feedback gets recorded for both corrections after failure and validations after success. The second one is easy to skip, but skipping it means re-negotiating the same good decision in the next session.

2.3 project — context for work in progress

Facts about the current work. The most volatile type.

---
name: q2-merge-freeze
description: Merge freeze schedule for Q2 mobile release cut
metadata:
  type: project
---
 
Merge freeze begins 2026-03-05 (mobile release branch cut).
 
**Why:** Mobile team is cutting their release branch.
**How to apply:** Flag non-critical PR work scheduled after that date.

The key is converting relative time to absolute time. Expressions like "next week" or "Thursday" lose meaning a few days after they hit memory. Convert and record them as absolute dates like "2026-03-05".

The project type goes stale fastest, so it is also the most pruned (see §5).

2.4 reference — pointers to external systems

Pointers to where information lives. Not the information itself.

---
name: oncall-latency-dashboard
description: Grafana board grafana.internal/d/api-latency — the oncall latency dashboard
metadata:
  type: reference
---
 
Dashboard to check when touching request-handling code.
Oncall watches this; it's what pages someone, so compare before/after on changes.

reference records what something is at a given location in an external system (Linear, Grafana, Notion, Slack, etc.). The information itself changes; the location is stable. Remembering the location saves you from re-hunting every time.

3. The index — a single MEMORY.md

Every memory file is discovered through one index.

# Memory
 
> First 200 lines auto-load. Core rules live in the persona; this is pointers only.
 
## Memory file index
 
### Rules / procedures
- [work-process.md](rules/work-process.md) — work intake procedure
- [multi-agent.md](rules/multi-agent.md) — subagent delegation, Quality Gate
- ...
 
### Feedback / references
- [feedback/](feedback/) — testing / model_policy / env_minimalism / ...
- [reference/dashboards.md](reference/dashboards.md) — oncall dashboard index

The index is not memory. It is a collection of one-line hooks for the memories. The index itself never carries content — always pointers.

This split keeps the index short. Even though the tool only auto-loads the first N lines at session start, every memory's hook fits in there. When a specific memory becomes relevant, only that one file gets Read at that moment.

An index plus trigger-based Read covers most cases more simply than RAG.

4. Auto-load vs on-demand — the token economy

The system combines two loading modes.

ModeWhat comes inWhen
Auto-loadFirst N lines of MEMORY.mdEvery session start
On-demand ReadSpecific files the index points toWhen entering work related to that file

This split is the heart of the token economy. Reading every memory file every session blows up tokens. Auto-load only the index, and Read the bodies only when needed, and almost every session starts very light.

Two triggers decide when a body gets Read.

  1. The operator explicitly references it — "look at the integration-test policy"
  2. The work itself enters the relevant area — starting a DB-related PR pulls in DB-related memories

The second matters more. The operator does not have to say "look at this policy" every time — the collaborator pulls relevant memory in on its own. The one-line description in the index is the basis for that decision.

5. Pruning — memory collapses if it only grows

The biggest trap in a memory system is accumulating without pruning. Pure accumulation creates two problems.

  1. Stale information stays alive — a project that ended a month ago still has its schedule in memory, pulling toward wrong judgments
  2. The index inflates — items grow past the 200-line auto-load range, and auto-load loses its meaning

I run the following pruning pattern.

TypePruning cadenceCriterion
userAlmost neverIdentity changes slowly
feedbackOnce a quarterRemove guidance that no longer holds
projectOnce a monthMove closed projects to an archive directory
referenceTwice a yearRemove broken links and gone systems

Archive is not delete. Move to a separate directory like memory/archived/. They drop out of the index but stay in git history and on disk, so you can pull them back if needed.

One more pattern — an auto audit tool. It periodically scans the memory directory and produces a report like "this item has not been referenced for 60+ days." The operator reads that report and makes pruning decisions quickly.

6. Failure patterns — every one I tried

Patterns I tried and abandoned on the way to this system.

6.1 Single giant file

Initially I put every memory into one file. A single notes.md. Within days it crossed 1,000 lines, and the tool spent almost all session-start tokens reading it.

Lesson: Memory must be split. Index + body files separation.

6.2 Splitting too finely

The opposite — splitting too fine. One file per decision. 50 files appeared, the index ballooned and consumed the entire auto-load window.

Lesson: One file = one "topic," not one "decision." Multiple decisions on the same topic go into the same file.

6.3 Indiscriminate accumulation

I tried an automation that "summarize today's work and append to memory" at session end. Memory grew every day and important decisions drowned in the noise.

Lesson: What goes into memory is a decision, fact, or pointer that is still useful in the next session. Today-only progress goes on the task board or in a history log, not in memory.

6.4 Plain text without metadata

I started with plain markdown, no frontmatter. As the count grew, "which type is this" and "why was this written" became untrackable.

Lesson: Force at least three frontmatter fields — name, description, type. In the body, two more lines — Why and How to apply.

7. Actual effect

What changed after adopting this system.

ItemBeforeAfter
Context restoration at session start5–10 minInstant (index auto-load)
Re-negotiating the same decisionOftenAlmost never
Re-explaining yesterday's workEvery timeNever (it's in memory)
Wrong judgment from stale infoSometimesRare (pruning)
Token spend patternFlat-high every sessionProportional to work (shallow stays light)

The biggest change is the last row. Token spend tracks the depth of the work. Shallow work finishes nearly for free; resources go into deep work.

8. Compressed into one principle

Memory system design collapses into one principle.

"The index is always there. The body only comes in when needed. What is no longer valid drops out of the index."

When all three of those hold, memory stops being a growing burden and starts being an asset that accelerates the work.

The next post covers the next layer the persona and memory point at — skills and slash commands, i.e. which repetitions are worth promoting into a one-line invocation.


Comments