Documentation Index
Fetch the complete documentation index at: https://docs.claude-mem.ai/llms.txt
Use this file to discover all available pages before exploring further.
Architecture Evolution
The Problem We Solved
Goal: Create a memory system that makes Claude smarter across sessions without the user noticing it exists.
Challenge: How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow?
This is the story of how claude-mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work.
v5.x: Maturity and User Experience
After establishing the solid v4 architecture, v5.x focused on user experience, visualization, and polish.
v5.1.2: Theme Toggle (November 2025)
What Changed: Added light/dark mode theme toggle to viewer UI
New Features:
- User-selectable theme preference (light, dark, system)
- Persistent theme settings in localStorage
- Smooth theme transitions
- System preference detection
Implementation:
// Theme context with persistence
const ThemeProvider = ({ children }) => {
const [theme, setTheme] = useState<'light' | 'dark' | 'system'>(() => {
return localStorage.getItem('claude-mem-theme') || 'system';
});
useEffect(() => {
localStorage.setItem('claude-mem-theme', theme);
}, [theme]);
return (
<ThemeContext.Provider value={{ theme, setTheme }}>
{children}
</ThemeContext.Provider>
);
};
Why It Matters: Users working in different lighting conditions can now customize the viewer for comfort.
v5.1.1: Worker Startup Fix (November 2025) - Now Deprecated
Note: This section describes a historical PM2-based approach that has been replaced with Bun in later versions.
The Problem: Worker startup failed on Windows with ENOENT error when using PM2
Historical Solution: Used full path to PM2 binary instead of relying on PATH
Current Approach: The project now uses Bun for process management, which provides better cross-platform compatibility and eliminates these PATH-related issues.
Impact: Cross-platform compatibility restored, Windows users can now use claude-mem without issues.
v5.1.0: Web-Based Viewer UI (October 2025)
The Breakthrough: Real-time visualization of memory stream
What We Built:
- React-based web UI at http://localhost:37777
- Server-Sent Events (SSE) for real-time updates
- Infinite scroll pagination
- Project filtering
- Settings persistence (sidebar state, selected project)
- Auto-reconnection with exponential backoff
- GPU-accelerated animations
New Worker Endpoints (8 additions):
GET / # Serves viewer HTML
GET /stream # SSE real-time updates
GET /api/prompts # Paginated user prompts
GET /api/observations # Paginated observations
GET /api/summaries # Paginated session summaries
GET /api/stats # Database statistics
GET /api/settings # User settings
POST /api/settings # Save settings
Database Enhancements:
// New SessionStore methods for viewer
getRecentPrompts(limit, offset, project?)
getRecentObservations(limit, offset, project?)
getRecentSummaries(limit, offset, project?)
getStats()
getUniqueProjects()
React Architecture:
src/ui/viewer/
├── components/
│ ├── Header.tsx # Navigation + stats
│ ├── Sidebar.tsx # Project filter
│ ├── Feed.tsx # Infinite scroll
│ └── cards/
│ ├── ObservationCard.tsx
│ ├── PromptCard.tsx
│ ├── SummaryCard.tsx
│ └── SkeletonCard.tsx
├── hooks/
│ ├── useSSE.ts # Real-time events
│ ├── usePagination.ts # Infinite scroll
│ ├── useSettings.ts # Persistence
│ └── useStats.ts # Statistics
└── utils/
├── merge.ts # Data deduplication
└── format.ts # Display formatting
Build Process:
// esbuild bundles everything into single HTML file
esbuild.build({
entryPoints: ['src/ui/viewer/index.tsx'],
bundle: true,
outfile: 'plugin/ui/viewer.html',
loader: { '.tsx': 'tsx', '.woff2': 'dataurl' },
define: { 'process.env.NODE_ENV': '"production"' },
});
Why It Matters: Users can now see exactly what’s being captured in real-time, making the memory system transparent and debuggable.
v5.0.3: Smart Install Caching (October 2025)
The Problem: npm install ran on every SessionStart (2-5 seconds)
The Insight: Dependencies rarely change between sessions
The Solution: Version-based caching
// Check version marker before installing
const currentVersion = getPackageVersion();
const installedVersion = readFileSync('.install-version', 'utf-8');
if (currentVersion !== installedVersion) {
// Only install if version changed
await runNpmInstall();
writeFileSync('.install-version', currentVersion);
}
Cached Check Logic:
- Does
node_modules exist?
- Does
.install-version match package.json version?
- Is
better-sqlite3 present? (Legacy: now uses bun:sqlite which requires no installation)
Impact:
- SessionStart hook: 2-5 seconds → 10ms (99.5% faster)
- Only installs on: first run, version change, missing deps
- Better Windows error messages with build tool help
v5.0.2: Worker Health Checks (October 2025)
What Changed: More robust worker startup and monitoring
New Features:
// Health check endpoint
app.get('/health', (req, res) => {
res.json({
status: 'ok',
uptime: process.uptime(),
port: WORKER_PORT,
memory: process.memoryUsage(),
});
});
// Smart worker startup
async function ensureWorkerHealthy() {
const healthy = await isWorkerHealthy(1000);
if (!healthy) {
await startWorker();
await waitForWorkerHealth(10000);
}
}
Benefits:
- Graceful degradation when worker is down
- Auto-recovery from crashes
- Better error messages for debugging
v5.0.1: Stability Improvements (October 2025)
What Changed: Various bug fixes and stability enhancements
Key Fixes:
- Fixed race conditions in observation queue processing
- Improved error handling in SDK worker
- Better cleanup of stale worker processes
- Enhanced logging for debugging
v5.0.0: Hybrid Search Architecture (October 2025)
The Evolution: SQLite FTS5 + Chroma vector search
What We Added:
┌─────────────────────────────────────────────────────────┐
│ HYBRID SEARCH │
│ │
│ Text Query → SQLite FTS5 (keyword matching) │
│ ↓ │
│ Chroma Vector Search (semantic) │
│ ↓ │
│ Merge + Re-rank Results │
└─────────────────────────────────────────────────────────┘
New Dependencies:
chromadb - Vector database for semantic search
- Python 3.8+ - Required by chromadb
MCP Tools Enhancement:
// Chroma-backed semantic search
search_observations({
query: "authentication bug",
useSemanticSearch: true // Uses Chroma
});
// Falls back to FTS5 if Chroma unavailable
Why Hybrid:
- FTS5: Fast keyword matching, no dependencies
- Chroma: Semantic understanding, finds related concepts
- Graceful degradation: Works without Chroma (FTS5 only)
Trade-offs:
- Added Python dependency (optional)
- Increased installation complexity
- Better search relevance
MCP Architecture Simplification (December 2025)
The Problem: Complex MCP Implementation
Before:
9+ MCP tools registered at session start:
- search_observations
- find_by_type
- find_by_file
- find_by_concept
- get_recent_context
- get_observation
- get_session
- get_prompt
- help
Problems:
- Overlapping operations (search_observations vs find_by_type)
- Complex parameter schemas (~2,500 tokens in tool definitions)
- No built-in workflow guidance
- High cognitive load for Claude (which tool to use?)
- Code size: ~2,718 lines in mcp-server.ts
The Insight: Progressive disclosure should be built into tool design itself, not something Claude has to remember.
The Solution: 3-Layer Workflow
After:
4 MCP tools following 3-layer workflow:
1. __IMPORTANT - Workflow documentation (always visible)
"3-LAYER WORKFLOW (ALWAYS FOLLOW):
1. search(query) → Get index with IDs
2. timeline(anchor=ID) → Get context
3. get_observations([IDs]) → Fetch details
NEVER fetch full details without filtering first."
2. search - Layer 1: Get index with IDs (~50-100 tokens/result)
3. timeline - Layer 2: Get chronological context
4. get_observations - Layer 3: Fetch full details (~500-1,000 tokens/result)
Benefits:
- Progressive disclosure enforced by tool structure
- No overlapping operations
- Simple schemas (additionalProperties: true)
- Clear workflow pattern
- Code size: ~312 lines in mcp-server.ts (88% reduction)
- ~10x token savings
Migration: Skill-Based Search Removed
Previously: Used skill-based search
- mem-search skill invoked via natural language
- HTTP API called directly via curl
- Progressive disclosure through skill loading
- 17 skill documentation files
Now: Removed skill-based approach
- MCP-only architecture
- Native MCP protocol (better Claude integration)
- Works with both Claude Desktop and Claude Code
- Simpler to maintain (no skill files)
- All 19 mem-search skill files removed (~2,744 lines)
Key Architectural Changes
MCP Server Refactor:
Before:
// Complex parameter schemas
{
name: "search_observations",
inputSchema: {
type: "object",
properties: {
query: { type: "string", description: "..." },
type: { type: "array", items: { enum: [...] } },
format: { enum: ["index", "full"] },
limit: { type: "number", minimum: 1, maximum: 100 },
// ... many more parameters
}
}
}
After:
// Simple schemas with workflow guidance
{
name: "search",
description: "Step 1: Search memory. Returns index with IDs.",
inputSchema: {
type: "object",
properties: {},
additionalProperties: true // Accept any parameters
}
}
Workflow Enforcement:
Before: Claude had to remember progressive disclosure pattern
After: Tool structure makes it impossible to skip steps
- Can’t get details without IDs from search
- Can’t search without seeing __IMPORTANT reminder
- Timeline provides middle ground (context without full details)
Impact
Token Efficiency:
Traditional: Fetch 20 observations upfront
→ 10,000-20,000 tokens
→ Only 2 observations relevant (90% waste)
3-Layer Workflow:
→ search (20 results): ~1,000-2,000 tokens
→ Review index, identify 3 relevant IDs
→ get_observations (3 IDs): ~1,500-3,000 tokens
→ Total: 2,500-5,000 tokens (50-75% savings)
Code Simplicity:
- MCP server: 2,718 lines → 312 lines (88% reduction)
- Removed: 19 skill files (~2,744 lines)
- Net reduction: ~5,150 lines of code removed
User Experience:
- Same natural language interaction
- Better token efficiency
- Clearer architecture
- Works identically on Claude Desktop and Claude Code
Design Philosophy
Progressive Disclosure Through Structure:
The 3-layer workflow embodies progressive disclosure at the architectural level:
- Layer 1 (Index) - “What exists?” - Cheap survey of options
- Layer 2 (Timeline) - “What was happening?” - Context around specific points
- Layer 3 (Details) - “Tell me everything” - Full details only when justified
Each layer provides a decision point where Claude can:
- Stop if irrelevant
- Get more context if uncertain
- Dive deep if confident
This makes it structurally difficult to waste tokens.
v1-v2: The Naive Approach
The First Attempt: Dump Everything
Architecture:
PostToolUse Hook → Save raw tool outputs → Retrieve everything on startup
What we learned:
- ❌ Context pollution (thousands of tokens of irrelevant data)
- ❌ No compression (raw tool outputs are verbose)
- ❌ No search (had to scan everything linearly)
- ✅ Proved the concept: Memory across sessions is valuable
Example of what went wrong:
SessionStart loaded:
- 150 file read operations
- 80 grep searches
- 45 bash commands
- Total: ~35,000 tokens
- Relevant to current task: ~500 tokens (1.4%)
v3: Smart Compression, Wrong Architecture
The Breakthrough: AI-Powered Compression
New idea: Use Claude itself to compress observations
Architecture:
PostToolUse Hook → Queue observation → SDK Worker → AI compression → Store insights
What we added:
- Claude Agent SDK integration - Use AI to compress observations
- Background worker - Don’t block main session
- Structured observations - Extract facts, decisions, insights
- Session summaries - Generate comprehensive summaries
What worked:
- ✅ Compression ratio: 10:1 to 100:1
- ✅ Semantic understanding (not just keyword matching)
- ✅ Background processing (hooks stayed fast)
- ✅ Search became useful
What didn’t work:
- ❌ Still loaded everything upfront
- ❌ Session ID management was broken
- ❌ Aggressive cleanup interrupted summaries
- ❌ Multiple SDK sessions per Claude Code session
The Key Realizations
Realization 1: Progressive Disclosure
Problem: Even compressed observations can pollute context if you load them all.
Insight: Humans don’t read everything before starting work. Why should AI?
Solution: Show an index first, fetch details on-demand.
❌ Old: Load 50 observations (8,500 tokens)
✅ New: Show index of 50 observations (800 tokens)
Agent fetches 2-3 relevant ones (300 tokens)
Total: 1,100 tokens vs 8,500 tokens
Impact:
- 87% reduction in context usage
- 100% relevance (only fetch what’s needed)
- Agent autonomy (decides what’s relevant)
Realization 2: Session ID Chaos
Problem: SDK session IDs change on every turn.
What we thought:
// ❌ Wrong assumption
UserPromptSubmit → Capture session ID once → Use forever
Reality:
// ✅ Actual behavior
Turn 1: session_abc123
Turn 2: session_def456
Turn 3: session_ghi789
Why this matters:
- Can’t resume sessions without tracking ID updates
- Session state gets lost between turns
- Observations get orphaned
Solution:
// Capture from system init message
for await (const msg of response) {
if (msg.type === 'system' && msg.subtype === 'init') {
sdkSessionId = msg.session_id;
await updateSessionId(sessionId, sdkSessionId);
}
}
Realization 3: Graceful vs Aggressive Cleanup
v3 approach:
// ❌ Aggressive: Kill worker immediately
SessionEnd → DELETE /worker/session → Worker stops
Problems:
- Summary generation interrupted mid-process
- Pending observations lost
- Race conditions everywhere
v4 approach:
// ✅ Graceful: Let worker finish
SessionEnd → Mark session complete → Worker finishes → Exit naturally
Benefits:
- Summaries complete successfully
- No lost observations
- Clean state transitions
Code:
// v3: Aggressive
async function sessionEnd(sessionId: string) {
await fetch(`http://localhost:37777/sessions/${sessionId}`, {
method: 'DELETE'
});
}
// v4: Graceful
async function sessionEnd(sessionId: string) {
await db.run(
'UPDATE sdk_sessions SET completed_at = ? WHERE id = ?',
[Date.now(), sessionId]
);
}
Realization 4: One Session, Not Many
Problem: We were creating multiple SDK sessions per Claude Code session.
What we thought:
Claude Code session → Create SDK session per observation → 100+ SDK sessions
Reality should be:
Claude Code session → ONE long-running SDK session → Streaming input
Why this matters:
- SDK maintains conversation state
- Context accumulates naturally
- Much more efficient
Implementation:
// ✅ Streaming Input Mode
async function* messageGenerator(): AsyncIterable<UserMessage> {
// Initial prompt
yield {
role: "user",
content: "You are a memory assistant..."
};
// Then continuously yield observations
while (session.status === 'active') {
const observations = await pollQueue();
for (const obs of observations) {
yield {
role: "user",
content: formatObservation(obs)
};
}
await sleep(1000);
}
}
const response = query({
prompt: messageGenerator(),
options: { maxTurns: 1000 }
});
v4: The Architecture That Works
The Core Design
┌─────────────────────────────────────────────────────────┐
│ CLAUDE CODE SESSION │
│ User → Claude → Tools (Read, Edit, Write, Bash) │
│ ↓ │
│ PostToolUse Hook │
│ (queues observation) │
└─────────────────────────────────────────────────────────┘
↓ SQLite queue
┌─────────────────────────────────────────────────────────┐
│ SDK WORKER PROCESS │
│ ONE streaming session per Claude Code session │
│ │
│ AsyncIterable<UserMessage> │
│ → Yields observations from queue │
│ → SDK compresses via AI │
│ → Parses XML responses │
│ → Stores in database │
└─────────────────────────────────────────────────────────┘
↓ SQLite storage
┌─────────────────────────────────────────────────────────┐
│ NEXT SESSION │
│ SessionStart Hook │
│ → Queries database │
│ → Returns progressive disclosure index │
│ → Agent fetches details via MCP │
└─────────────────────────────────────────────────────────┘
The Five Hook Architecture
SessionStart
UserPromptSubmit
PostToolUse
Summary
SessionEnd
Purpose: Inject context from previous sessionsTiming: When Claude Code startsWhat it does:
- Queries last 10 session summaries
- Formats as progressive disclosure index
- Injects into context via stdout
Key change from v3:
- ✅ Index format (not full details)
- ✅ Token counts visible
- ✅ MCP search instructions included
Purpose: Initialize session trackingTiming: Before Claude processes promptWhat it does:
- Creates session record
- Saves raw user prompt (v4.2.0+)
- Starts worker if needed
Key change from v3:
- ✅ Stores raw prompts for search
- ✅ Auto-starts worker service
Purpose: Capture tool observationsTiming: After every tool executionWhat it does:
- Enqueues observation in database
- Returns immediately
Key change from v3:
- ✅ Just enqueues (doesn’t process)
- ✅ Worker handles all AI calls
Purpose: Generate session summariesTiming: Worker-triggered (mid-session)What it does:
- Gathers observations
- Sends to Claude for summarization
- Stores structured summary
Key change from v3:
- ✅ Multiple summaries per session
- ✅ Summaries are checkpoints, not endings
Purpose: Graceful cleanupTiming: When session endsWhat it does:
- Marks session complete
- Lets worker finish processing
Key change from v3:
- ✅ Graceful (not aggressive)
- ✅ No DELETE requests
- ✅ Worker finishes naturally
Database Schema Evolution
v3 schema:
-- Simple, flat structure
CREATE TABLE observations (
id INTEGER PRIMARY KEY,
session_id TEXT,
text TEXT,
created_at INTEGER
);
v4 schema:
-- Rich, structured schema
CREATE TABLE observations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
project TEXT NOT NULL,
-- Progressive disclosure metadata
title TEXT NOT NULL,
subtitle TEXT,
type TEXT NOT NULL, -- decision, bugfix, feature, etc.
-- Content
narrative TEXT NOT NULL,
facts TEXT, -- JSON array
-- Searchability
concepts TEXT, -- JSON array of tags
files_read TEXT, -- JSON array
files_modified TEXT, -- JSON array
-- Timestamps
created_at TEXT NOT NULL,
created_at_epoch INTEGER NOT NULL,
FOREIGN KEY(session_id) REFERENCES sdk_sessions(id)
);
-- FTS5 for full-text search
CREATE VIRTUAL TABLE observations_fts USING fts5(
title, subtitle, narrative, facts, concepts,
content=observations
);
-- Auto-sync triggers
CREATE TRIGGER observations_ai AFTER INSERT ON observations BEGIN
INSERT INTO observations_fts(rowid, title, subtitle, narrative, facts, concepts)
VALUES (new.id, new.title, new.subtitle, new.narrative, new.facts, new.concepts);
END;
What changed:
- ✅ Structured fields (title, subtitle, type)
- ✅ FTS5 full-text search
- ✅ Project-scoped queries
- ✅ Rich metadata for progressive disclosure
Worker Service Redesign
v3 worker:
// Multiple short SDK sessions
app.post('/process', async (req, res) => {
const response = await query({
prompt: buildPrompt(req.body),
options: { maxTurns: 1 }
});
for await (const msg of response) {
// Process single observation
}
res.json({ success: true });
});
v4 worker:
// ONE long-running SDK session
async function runWorker(sessionId: string) {
const response = query({
prompt: messageGenerator(), // AsyncIterable
options: { maxTurns: 1000 }
});
for await (const msg of response) {
if (msg.type === 'text') {
parseObservations(msg.content);
parseSummaries(msg.content);
}
}
}
Benefits:
- Maintains conversation state
- SDK handles context automatically
- More efficient (fewer API calls)
- Natural multi-turn flow
Critical Fixes Along the Way
Fix 1: Context Injection Pollution (v4.3.1)
Problem: SessionStart hook output polluted with npm install logs
# Hook output contained:
npm WARN deprecated ...
npm WARN deprecated ...
{"hookSpecificOutput": {"additionalContext": "..."}}
Why it broke:
- Claude Code expects clean JSON or plain text
- stderr/stdout from npm install mixed with hook output
- Context didn’t inject properly
Solution:
{
"command": "npm install --loglevel=silent && node context-hook.js"
}
Result: Clean JSON output, context injection works
Fix 2: Double Shebang Issue (v4.3.1)
Problem: Hook executables had duplicate shebangs
#!/usr/bin/env node
#!/usr/bin/env node // ← Duplicate!
// Rest of code...
Why it happened:
- Source files had shebang
- esbuild added another shebang during build
Solution:
// Remove shebangs from source files
// Let esbuild add them during build
Result: Clean executables, no parsing errors
Fix 3: FTS5 Injection Vulnerability (v4.2.3)
Problem: User input passed directly to FTS5 query
// ❌ Vulnerable
const results = db.query(
`SELECT * FROM observations_fts WHERE observations_fts MATCH '${userQuery}'`
);
Attack:
userQuery = "'; DROP TABLE observations; --"
Solution:
// ✅ Safe: Use parameterized queries
const results = db.query(
'SELECT * FROM observations_fts WHERE observations_fts MATCH ?',
[userQuery]
);
Fix 4: NOT NULL Constraint Violation (v4.2.8)
Problem: Session creation failed when prompt was empty
INSERT INTO sdk_sessions (claude_session_id, user_prompt, ...)
VALUES ('abc123', NULL, ...) -- ❌ user_prompt is NOT NULL
Solution:
// Allow NULL user_prompts
user_prompt: input.prompt ?? null
Schema change:
-- Before
user_prompt TEXT NOT NULL
-- After
user_prompt TEXT -- Nullable
Optimization 1: Prepared Statements
Before:
for (const obs of observations) {
db.run(`INSERT INTO observations (...) VALUES (?, ?, ...)`, [obs.id, obs.text, ...]);
}
After:
const stmt = db.prepare(`INSERT INTO observations (...) VALUES (?, ?, ...)`);
for (const obs of observations) {
stmt.run([obs.id, obs.text, ...]);
}
stmt.finalize();
Impact: 5x faster bulk inserts
Optimization 2: FTS5 Indexing
Before:
// Manual full-text search
const results = db.query(
`SELECT * FROM observations WHERE text LIKE '%${query}%'`
);
After:
// FTS5 virtual table
const results = db.query(
`SELECT * FROM observations_fts WHERE observations_fts MATCH ?`,
[query]
);
Impact: 100x faster searches on large datasets
Before:
// Always return full observations
search_observations({ query: "hooks" });
// Returns: 5,000 tokens
After:
// Default to index format
search_observations({ query: "hooks", format: "index" });
// Returns: 200 tokens
// Fetch full only when needed
search_observations({ query: "hooks", format: "full", limit: 1 });
// Returns: 150 tokens
Impact: 25x reduction in average search result size
What We Learned
Lesson 1: Context is Precious
Principle: Every token you put in context window costs attention.
Application:
- Progressive disclosure reduces waste by 87%
- Index-first approach gives agent control
- Token counts make costs visible
Lesson 2: Session State is Complicated
Principle: Distributed state is hard. SDK handles it better than we can.
Application:
- Use SDK’s built-in session resumption
- Don’t try to manually reconstruct state
- Track session IDs from init messages
Lesson 3: Graceful Beats Aggressive
Principle: Let processes finish their work before terminating.
Application:
- Graceful cleanup prevents data loss
- Workers finish important operations
- Clean state transitions reduce bugs
Lesson 4: AI is the Compressor
Principle: Don’t compress manually. Let AI do semantic compression.
Application:
- 10:1 to 100:1 compression ratios
- Semantic understanding, not keyword extraction
- Structured outputs (XML parsing)
Lesson 5: Progressive Everything
Principle: Show metadata first, fetch details on-demand.
Application:
- Progressive disclosure in context injection
- Index format in search results
- Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details)
The Road Ahead
Planned: Adaptive Index Size
SessionStart({ source: "startup" }):
→ Show last 10 sessions (normal)
SessionStart({ source: "resume" }):
→ Show only current session (minimal)
SessionStart({ source: "compact" }):
→ Show last 20 sessions (comprehensive)
Planned: Relevance Scoring
// Use embeddings to pre-sort index by semantic relevance
search_observations({
query: "authentication bug",
sort: "relevance" // Based on embeddings
});
Planned: Multi-Project Context
// Cross-project pattern recognition
search_observations({
query: "API rate limiting",
projects: ["api-gateway", "user-service", "billing-service"]
});
Planned: Collaborative Memory
// Team-shared observations (optional)
createObservation({
title: "Rate limit: 100 req/min",
scope: "team" // vs "user"
});
Migration Guide: v3 → v5
Step 1: Backup Database
cp ~/.claude-mem/claude-mem.db ~/.claude-mem/claude-mem-v3-backup.db
Step 2: Update Plugin
cd ~/.claude/plugins/marketplaces/thedotmack
git pull
Step 3: Update Plugin
/plugin update claude-mem
What happens automatically:
- Dependencies update (including new ones like chromadb for v5.0.0+)
- Database schema migrations run automatically
- Worker service restarts with new code
- Smart install caching activates (v5.0.3+)
Step 4: Test
# Start Claude Code
claude
# Check that context is injected
# (Should see progressive disclosure index with v5 viewer link)
# Open viewer UI (v5.1.0+)
open http://localhost:37777
# Submit a prompt and watch real-time updates in viewer
Step 5: Explore New Features
# View memory stream in browser (v5.1.0+)
open http://localhost:37777
# Toggle theme (v5.1.2+)
# Click theme button in viewer header
# Check worker health
npm run worker:status
curl http://localhost:37777/health
Key Metrics
| Metric | Value |
|---|
| Context usage per session | ~25,000 tokens |
| Relevant context | ~2,000 tokens (8%) |
| Hook execution time | ~200ms |
| Search latency | ~500ms (LIKE queries) |
| Metric | Value |
|---|
| Context usage per session | ~1,100 tokens |
| Relevant context | ~1,100 tokens (100%) |
| Hook execution time | ~45ms |
| Search latency | ~15ms (FTS5) |
| Metric | Value |
|---|
| Context usage per session | ~1,100 tokens |
| Relevant context | ~1,100 tokens (100%) |
| Hook execution time | ~10ms (cached install) |
| Search latency | ~12ms (FTS5) or ~25ms (hybrid) |
| Viewer UI load time | ~50ms (bundled HTML) |
| SSE update latency | ~5ms (real-time) |
v3 → v4 Improvements:
- 96% reduction in context waste
- 12x increase in relevance
- 4x faster hooks
- 33x faster search
v4 → v5 Improvements:
- 78% faster hooks (smart caching)
- Real-time visualization (viewer UI)
- Better search relevance (hybrid)
- Enhanced UX (theme toggle, persistence)
Conclusion
The journey from v3 to v5 was about understanding these fundamental truths:
- Context is finite - Progressive disclosure respects attention budget
- AI is the compressor - Semantic understanding beats keyword extraction
- Agents are smart - Let them decide what to fetch
- State is hard - Use SDK’s built-in mechanisms
- Graceful wins - Let processes finish cleanly
The result is a memory system that’s both powerful and invisible. Users never notice it working - Claude just gets smarter over time.
v5 adds visibility: Now users CAN see the memory system working if they want (via viewer UI), but it’s still non-intrusive.
Further Reading
This architecture evolution reflects hundreds of hours of experimentation, dozens of dead ends, and the invaluable experience of real-world usage. v5 is the architecture that emerged from understanding what actually works - and making it visible to users.