What is persistent memory for AI coding agents?

Persistent memory lets tools like Claude Code, Codex, or Gemini CLI recall context across sessions instead of starting cold. It spans static project files such as CLAUDE.md, MCP memory servers like agentmemory and claude-mem, and platform-native memory in Anthropic's Memory tool and Managed Agents.

Why isn't CLAUDE.md enough for persistent memory?

CLAUDE.md is read-only at session start. Anything the agent learns mid-session is lost when the conversation ends. Long-running refactors, multi-session debugging, and teams of agents sharing learnings all hit that wall. A read-write memory layer outside the file fills the gap.

What's the difference between CLAUDE.md, MCP memory servers, and Anthropic's Memory tool?

CLAUDE.md is a static file you write. MCP memory servers are external processes that capture and recall context for any MCP client. Anthropic's Memory tool is an API capability for agents you build with the SDK. They sit at three different layers: project, tool, and platform.

Which is better: agentmemory or claude-mem?

agentmemory is the cleaner solo-dev install with measured 95.2% R@5 on LongMemEval-S and zero infrastructure. claude-mem has more stars, a richer plugin marketplace, and a server-beta runtime backed by Postgres for teams. Solo dev: agentmemory. Team or production: claude-mem server-beta.

How does Anthropic Dreaming work?

Dreaming is a scheduled process inside Managed Agents that reviews past sessions and memory stores, extracts patterns across them, and curates the memory between runs. It surfaces recurring mistakes and shared workflows. It is a research preview and can run automatically or with human review.

Does the Anthropic Memory tool work with Claude Code the CLI?

No. The Memory tool is an API capability with type memory_20250818 for agents you build with the Anthropic SDK. Claude Code the CLI uses CLAUDE.md plus MCP servers for persistence today. If you want Memory-tool semantics in Claude Code, wrap it via an MCP server.

Can multiple AI agents share the same memory?

Yes. agentmemory exposes both an MCP server and a REST API so 15+ different agents can read the same store. Memory for Managed Agents allows one agent to share what it learned with other agents in the same workspace, with per-write audit logs and programmatic access.

How much context window does persistent memory actually save?

Anthropic reports 84% token savings on long-running tasks when the Memory tool is paired with context editing. agentmemory publishes around 92% context reduction with hybrid retrieval. claude-mem advertises roughly 10x savings through layered search. Real savings depend on how aggressive your hooks are.

AI Development

Persistent Memory for AI Coding Agents Beyond CLAUDE.md

Persistent memory lets an AI coding agent recall earlier sessions instead of restarting from zero. In 2026 the space splits into three tiers: static project files, MCP memory servers, and platform-native memory in Anthropic's Memory tool and Managed Agents. This guide walks each tier, the trade-offs, and how I pick.

May 29, 2026-

12 min read-Last updated: 2026-05-29

Claude CodeMemoryMCPAnthropicAgents

Table of Contents

TL;DR

CLAUDE.md is the floor, not the ceiling. It handles roughly 80% of solo workflows but breaks when sessions span days or agents need to learn.
MCP memory servers fill the gap. agentmemory (19.5K stars) and claude-mem (79.5K stars) are the two I reach for in 2026.
Anthropic's Memory tool (memory_20250818) plus context editing show 84% token savings on long-running API agents. Dreaming refines the memory across sessions.
The right answer depends on team size, who owns the storage, and whether you live in Claude Code, the API, or a Managed Agent.

Why CLAUDE.md Alone Runs Out of Room

CLAUDE.md loads at session start and never changes during the run. That is exactly what you want for project rules, but it is the wrong shape for what the agent learns while it works. A tricky bug, a quirky API timing window, the right magic flag for your build - none of it survives /exit. Next session opens cold and you re-explain.

I hit three breakpoints in practice. First, the file itself grows past a few hundred lines and starts costing real tokens per turn. Second, work spans multiple sessions - a week-long refactor or a flaky test you keep returning to. Third, multiple agents need to share something one of them figured out. CLAUDE.md does none of these.

Anthropic's own multi-session pattern in the Memory tool docs treats memory as a recovery mechanism: an initializer session bootstraps progress logs and checklists, every later session opens by reading them, and every session updates them before ending. That pattern is impossible inside a static CLAUDE.md.

If you have not written a CLAUDE.md yet, start with my CLAUDE.md guide first. This post picks up the day CLAUDE.md is no longer the answer.

The Three Tiers of Persistent Memory in 2026

Most existing posts list six levels and then leave you to wire them yourself. I find a three-tier frame more useful because each tier maps cleanly to who controls the storage and where the agent runs.

Tier 1 - Static Files

CLAUDE.md, AGENTS.md, .cursor/rules. Version-controlled.

Best for: project rules, conventions, build commands.

Tier 2 - MCP Memory Servers

agentmemory, claude-mem, mcp-memory-service. Local SQLite + retrieval.

Best for: cross-session recall in any MCP client.

Tier 3 - Platform-Native

Anthropic Memory tool, Dreaming, Memory for Managed Agents, Cloudflare Agent Memory.

Best for: API agents, teams, audit-heavy workflows.

Memory Taxonomy

Episodic, semantic, procedural.

Procedural is the underserved one. Hooks-based MCP servers attack it directly.

The mem0 team's State of AI Agent Memory 2026 report uses the classic episodic / semantic / procedural split. Procedural memory - learned workflows, tool-use habits, the right way to fix this codebase - is the one most current tools still skip. That is exactly the gap auto-capturing MCP servers target with hooks.

Tier 1 - Static Project Files (CLAUDE.md, AGENTS.md)

Tier 1 is the cheapest persistence you can buy: a text file in your repo. CLAUDE.md for Claude Code, AGENTS.md for Codex and several others, .cursor/rules for Cursor. The agent loads it on session start. You write it, you commit it, your team sees the same rules.

The strengths are real. Zero infrastructure, deterministic behavior, full diff history in git, and complete control over what the agent sees. The weakness is also real. It is read-only from the agent's side. Nothing the agent discovers ends up back in the file unless you paste it in yourself.

Hierarchical loading is the one trick that stretches Tier 1 further. Claude Code looks at the project CLAUDE.md first, then your personal ~/.claude/CLAUDE.md, then imports referenced via @path/to/file.md. That lets you keep a slim project file plus a personal layer for cross-project preferences. I cover the full pattern in my CLAUDE.md guide. When you have squeezed Tier 1 dry, that is when Tier 2 starts paying for itself.

Tier 2 - MCP Memory Servers (agentmemory, claude-mem)

Tier 2 is where the action is in 2026. An MCP memory server runs as a local process, exposes memory-shaped tools to your agent, and captures or recalls context across sessions. Because it speaks the Model Context Protocol, the same server works with Claude Code, Codex, Gemini CLI, and any other MCP client.

Three projects dominate the space. Star counts are accurate as of May 29, 2026 and will move.

Tool	Stars	Storage	Capture	Notable
agentmemory	19.5K	SQLite + BM25 + vector + KG	12 auto-capture hooks	95.2% R@5 LongMemEval-S, ~92% context reduction, 4-tier consolidation
claude-mem	79.5K	SQLite + FTS5 + Chroma	summary-on-clear hook	~10x token savings, web viewer, plugin marketplace, Postgres server-beta
mcp-memory-service	-	SQLite knowledge graph + REST	entity/relation/observation API	LangGraph, CrewAI, AutoGen integrations; autonomous consolidation

The headline numbers come from each project's benchmarks. agentmemory reports 95.2% R@5 on LongMemEval-S, a retrieval suite that tests how often the right fact gets pulled back. That number says the retrieval is solid; it does not say your agent codes 95% better. I keep that distinction in mind when reviewing claims.

Installing each

bashterminal

# agentmemory - npm global install, zero infra
npm install -g @agentmemory/agentmemory
agentmemory  # starts the MCP server and the dashboard on :3113

# claude-mem - npx, hooks into Claude Code via the plugin marketplace
npx claude-mem install
# or inside Claude Code:
# /plugin marketplace add thedotmack/claude-mem

# mcp-memory-service - Python, REST + MCP
pip install mcp-memory-service
mcp-memory-service serve

Wiring it into Claude Code

Once the server is up, Claude Code picks it up through your ~/.claude.json MCP block. The shape is the same one you use for any MCP server - I cover the pattern in detail in my MCP code execution guide. The minimal config for agentmemory looks like this:

json~/.claude.json

{
  "mcpServers": {
    "agentmemory": {
      "command": "agentmemory",
      "args": ["serve", "--stdio"]
    }
  }
}

Restart Claude Code, run /mcp, and you should see agentmemory in the connected list. Open http://localhost:3113 in a browser and you get the live dashboard - sessions, observations, what the agent has chosen to store. That dashboard is the single biggest reason I reach for agentmemory first: I can see what it is saving and tune before it pollutes the store.

The honest caveat for any hooks-based capture is that it costs tokens during the session. Every tool call your agent makes can trigger a capture, and captures are not free. Watch the dashboard for the first week and disable hooks you do not need. If you want a working MCP server you can read end to end before installing someone else's, my Jenkins MCP project is a small reference.

Tier 3 - Anthropic Memory Tool, Dreaming, Managed Agents

Tier 3 is what Anthropic and the major platforms shipped in the last quarter. It is the cleanest answer when the agent runs inside their world rather than yours.

Anthropic Memory tool

The Memory tool is an API capability you opt into per request. Tool type memory_20250818, name memory. When enabled, Claude automatically checks a /memories directory at the start of a task and can create, view,str_replace, insert, delete, and rename files there. Crucially, storage is client-side. You implement the handler and you control where the bytes live.

pythonagent.py

from anthropic import Anthropic

client = Anthropic()

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=2048,
    tools=[{"type": "memory_20250818", "name": "memory"}],
    messages=[{
        "role": "user",
        "content": "Help me debug the timeout in fetch_page()."
    }],
)
print(message)

The official Python example shows the client-side handler you need to fill in. Pair the tool with context editing using the beta header context-management-2025-06-27 and Anthropic reports 84% token savings on long-running tasks. That is the number worth quoting when you have to justify the integration to a finance partner.

Dreaming

Dreaming, announced at Code with Claude on May 6-7 2026, is a scheduled process inside Claude Managed Agents that reviews past sessions and memory stores, extracts patterns, and curates them between runs. The marquee external number is from Anthropic's announcement: legal AI company Harvey reported a roughly 6x lift in task-completion rates after enabling it. That is Anthropic's data, not an independent benchmark, so quote it with attribution.

Dreaming is a research preview today. You decide whether it edits memory automatically or queues changes for review. I default to review-mode for client work and auto for my own tooling, where I can rebuild the store if something goes sideways.

Memory for Managed Agents

Memory for Managed Agents went into public beta on April 23, 2026. It stores what each agent learns across sessions as files, ships with per-write audit logs and programmatic access, and lets one agent share what it learned with other agents in the same workspace. If multiple agents on your team need to converge on a shared playbook, this is where it lives without you running a server. I cover the broader Managed Agents picture in Claude Managed Agents Outcomes.

Cloudflare Agent Memory

Cloudflare opened a private beta of Agent Memory on April 17, 2026. It runs on the edge and is the right fit when your agents are themselves Workers and you care about regional distribution more than tight integration with the Anthropic stack. I have not put it in production, so I will not pretend to have war stories.

The Memory tool is not Claude Code memory

This is the single most common reader confusion. The memory_20250818 tool is an API capability you wire into agents you build with the SDK. Claude Code the CLI does not use that tool directly today. If you want Memory-tool-style semantics inside Claude Code, the path is an MCP server (Tier 2), not a flag.

Adding Persistent Memory to Claude Code in Five Minutes

Here is the fastest path to working persistent memory in Claude Code right now. I am using agentmemory because it is Apache-2.0, zero-infra, and the dashboard makes the first day sane. The pattern is the same for any MCP memory server - swap the binary and the config block.

bashterminal

# 1. Install agentmemory globally
npm install -g @agentmemory/agentmemory

# 2. Register it in Claude Code (or edit ~/.claude.json directly)
claude mcp add agentmemory agentmemory -- serve --stdio

# 3. Verify
claude
> /mcp
# expect: agentmemory  connected

# 4. Open the dashboard while a session runs
open http://localhost:3113

# 5. In a fresh session the next day:
> What do you remember about this project?

Step five is where the value shows up. The agent surfaces what it captured the day before - files it touched, patterns it noticed, decisions it made. If the recall is wrong or noisy, head to the dashboard and prune. Treat the first week as tuning, not production.

First-24-hours tip

Do not trust auto-capture blindly. Review the dashboard once a day for the first week and delete anything that looks like noise - failed experiments, half-finished thoughts, dead ends. A clean store at day seven beats a sprawling one at day one. The same discipline applies to regression-proofing your Claude Code workflows: if the input is unreliable, the output will be too.

Which Tier Should You Use? A Decision Matrix

Tier choice is mostly about who runs the agent and who owns the storage. Match the row to your situation and start there. Stacking tiers is fine and common, but you ship faster by starting at the lowest one that actually solves the problem.

If you...	Use
Solo dev, one project, sessions under two hours	CLAUDE.md only (Tier 1)
Solo dev, week-long work, want hands-off recall	CLAUDE.md + agentmemory or claude-mem (Tier 1 + Tier 2)
Team with shared agents and audit needs	Memory for Managed Agents or claude-mem server-beta (Tier 3)
Building your own agent on the Anthropic API	Memory tool + context editing (Tier 3)
Need edge-distributed memory across regions	Cloudflare Agent Memory (Tier 3)

Mature setups stack Tier 1 plus Tier 2 plus Dreaming, but you do not need that on day one. Most of the productivity wins from persistent memory show up at Tier 2 with a single well-tuned MCP server, not at the top of the stack.

Trade-offs: Privacy, Token Cost, and What Breaks

Memory is one more system that can fail, leak, or quietly lie to the agent. Four things to watch.

Privacy and storage

Every MCP memory server in Tier 2 lives on your disk by default. That is the right answer for client work. Anthropic's Memory tool is also client-side. If you write your own handler, validate paths and reject anything outside /memories. The docs flag path traversal explicitly because LLMs occasionally try to read places they should not. Use pathlib.Path.resolve() and relative_to() in Python, or the equivalent in your runtime.

Token cost

Capture is not free. Hooks-based tools add tokens to every tool call. agentmemory publishes roughly 170K tokens per year at the iii engine, which works out near $10 annually for the storage path itself; LLM-summarized approaches run four to six times that. On the recall side Anthropic measures 84% token savings on long-running tasks with Memory tool + context editing, and claude-mem advertises about 10x savings through layered search. Real savings depend on how disciplined your hooks and your prompts are. Pair this with Claude Code cost tracking before and after you turn memory on so you measure the delta, not a vibe.

Staleness

A stored fact that became wrong after a refactor is worse than no memory. The store will confidently return outdated guidance and the agent will trust it. Either schedule a weekly prune or use Dreaming-style curation to consolidate, deprecate, and rewrite. mem0's report calls this out as one of six unsolved production gaps.

Cross-agent leakage

If your coding agent and your customer-support agent share a memory store, customer data can surface into a pair-programming session. Segment workspaces by purpose and apply the same allowlisting discipline you would for any agent surface. If you want the full security framing for agent workflows, my guide on hardening AI agents in CI/CD against prompt injection covers the threat model.