Architecture Evolution: The Journey from v3 to v5
The Problem We Solved
Goal: Create a memory system that makes Claude smarter across sessions without the user noticing it exists. Challenge: How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow? This is the story of how claude-mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work.v5.x: Maturity and User Experience
After establishing the solid v4 architecture, v5.x focused on user experience, visualization, and polish.v5.1.2: Theme Toggle (November 2025)
What Changed: Added light/dark mode theme toggle to viewer UI New Features:- User-selectable theme preference (light, dark, system)
- Persistent theme settings in localStorage
- Smooth theme transitions
- System preference detection
v5.1.1: PM2 Windows Fix (November 2025)
The Problem: PM2 startup failed on Windows with ENOENT error Root Cause:v5.1.0: Web-Based Viewer UI (October 2025)
The Breakthrough: Real-time visualization of memory stream What We Built:- React-based web UI at http://localhost:37777
- Server-Sent Events (SSE) for real-time updates
- Infinite scroll pagination
- Project filtering
- Settings persistence (sidebar state, selected project)
- Auto-reconnection with exponential backoff
- GPU-accelerated animations
v5.0.3: Smart Install Caching (October 2025)
The Problem:npm install ran on every SessionStart (2-5 seconds)
The Insight: Dependencies rarely change between sessions
The Solution: Version-based caching
- Does
node_modulesexist? - Does
.install-versionmatchpackage.jsonversion? - Is
better-sqlite3present?
- SessionStart hook: 2-5 seconds → 10ms (99.5% faster)
- Only installs on: first run, version change, missing deps
- Better Windows error messages with build tool help
v5.0.2: Worker Health Checks (October 2025)
What Changed: More robust worker startup and monitoring New Features:- Graceful degradation when worker is down
- Auto-recovery from crashes
- Better error messages for debugging
v5.0.1: Stability Improvements (October 2025)
What Changed: Various bug fixes and stability enhancements Key Fixes:- Fixed race conditions in observation queue processing
- Improved error handling in SDK worker
- Better cleanup of stale PM2 processes
- Enhanced logging for debugging
v5.0.0: Hybrid Search Architecture (October 2025)
The Evolution: SQLite FTS5 + Chroma vector search What We Added:chromadb- Vector database for semantic search- Python 3.8+ - Required by chromadb
- FTS5: Fast keyword matching, no dependencies
- Chroma: Semantic understanding, finds related concepts
- Graceful degradation: Works without Chroma (FTS5 only)
- Added Python dependency (optional)
- Increased installation complexity
- Better search relevance
v1-v2: The Naive Approach
The First Attempt: Dump Everything
Architecture:- ❌ Context pollution (thousands of tokens of irrelevant data)
- ❌ No compression (raw tool outputs are verbose)
- ❌ No search (had to scan everything linearly)
- ✅ Proved the concept: Memory across sessions is valuable
v3: Smart Compression, Wrong Architecture
The Breakthrough: AI-Powered Compression
New idea: Use Claude itself to compress observations Architecture:- Claude Agent SDK integration - Use AI to compress observations
- Background worker - Don’t block main session
- Structured observations - Extract facts, decisions, insights
- Session summaries - Generate comprehensive summaries
- ✅ Compression ratio: 10:1 to 100:1
- ✅ Semantic understanding (not just keyword matching)
- ✅ Background processing (hooks stayed fast)
- ✅ Search became useful
- ❌ Still loaded everything upfront
- ❌ Session ID management was broken
- ❌ Aggressive cleanup interrupted summaries
- ❌ Multiple SDK sessions per Claude Code session
The Key Realizations
Realization 1: Progressive Disclosure
Problem: Even compressed observations can pollute context if you load them all. Insight: Humans don’t read everything before starting work. Why should AI? Solution: Show an index first, fetch details on-demand.- 87% reduction in context usage
- 100% relevance (only fetch what’s needed)
- Agent autonomy (decides what’s relevant)
Realization 2: Session ID Chaos
Problem: SDK session IDs change on every turn. What we thought:- Can’t resume sessions without tracking ID updates
- Session state gets lost between turns
- Observations get orphaned
Realization 3: Graceful vs Aggressive Cleanup
v3 approach:- Summary generation interrupted mid-process
- Pending observations lost
- Race conditions everywhere
- Summaries complete successfully
- No lost observations
- Clean state transitions
Realization 4: One Session, Not Many
Problem: We were creating multiple SDK sessions per Claude Code session. What we thought:- SDK maintains conversation state
- Context accumulates naturally
- Much more efficient
v4: The Architecture That Works
The Core Design
The Five Hook Architecture
- SessionStart
- UserPromptSubmit
- PostToolUse
- Summary
- SessionEnd
Purpose: Inject context from previous sessionsTiming: When Claude Code startsWhat it does:
- Queries last 10 session summaries
- Formats as progressive disclosure index
- Injects into context via stdout
- ✅ Index format (not full details)
- ✅ Token counts visible
- ✅ MCP search instructions included
Database Schema Evolution
v3 schema:- ✅ Structured fields (title, subtitle, type)
- ✅ FTS5 full-text search
- ✅ Project-scoped queries
- ✅ Rich metadata for progressive disclosure
Worker Service Redesign
v3 worker:- Maintains conversation state
- SDK handles context automatically
- More efficient (fewer API calls)
- Natural multi-turn flow
Critical Fixes Along the Way
Fix 1: Context Injection Pollution (v4.3.1)
Problem: SessionStart hook output polluted with npm install logs- Claude Code expects clean JSON or plain text
- stderr/stdout from npm install mixed with hook output
- Context didn’t inject properly
Fix 2: Double Shebang Issue (v4.3.1)
Problem: Hook executables had duplicate shebangs- Source files had shebang
- esbuild added another shebang during build
Fix 3: FTS5 Injection Vulnerability (v4.2.3)
Problem: User input passed directly to FTS5 queryFix 4: NOT NULL Constraint Violation (v4.2.8)
Problem: Session creation failed when prompt was emptyPerformance Improvements
Optimization 1: Prepared Statements
Before:Optimization 2: FTS5 Indexing
Before:Optimization 3: Index Format Default
Before:What We Learned
Lesson 1: Context is Precious
Principle: Every token you put in context window costs attention. Application:- Progressive disclosure reduces waste by 87%
- Index-first approach gives agent control
- Token counts make costs visible
Lesson 2: Session State is Complicated
Principle: Distributed state is hard. SDK handles it better than we can. Application:- Use SDK’s built-in session resumption
- Don’t try to manually reconstruct state
- Track session IDs from init messages
Lesson 3: Graceful Beats Aggressive
Principle: Let processes finish their work before terminating. Application:- Graceful cleanup prevents data loss
- Workers finish important operations
- Clean state transitions reduce bugs
Lesson 4: AI is the Compressor
Principle: Don’t compress manually. Let AI do semantic compression. Application:- 10:1 to 100:1 compression ratios
- Semantic understanding, not keyword extraction
- Structured outputs (XML parsing)
Lesson 5: Progressive Everything
Principle: Show metadata first, fetch details on-demand. Application:- Progressive disclosure in context injection
- Index format in search results
- Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details)
The Road Ahead
Planned: Adaptive Index Size
Planned: Relevance Scoring
Planned: Multi-Project Context
Planned: Collaborative Memory
Migration Guide: v3 → v5
Step 1: Backup Database
Step 2: Update Plugin
Step 3: Update Plugin
- Dependencies update (including new ones like chromadb for v5.0.0+)
- Database schema migrations run automatically
- Worker service restarts with new code
- Smart install caching activates (v5.0.3+)
Step 4: Test
Step 5: Explore New Features
Key Metrics
v3 Performance
| Metric | Value |
|---|---|
| Context usage per session | ~25,000 tokens |
| Relevant context | ~2,000 tokens (8%) |
| Hook execution time | ~200ms |
| Search latency | ~500ms (LIKE queries) |
v4 Performance
| Metric | Value |
|---|---|
| Context usage per session | ~1,100 tokens |
| Relevant context | ~1,100 tokens (100%) |
| Hook execution time | ~45ms |
| Search latency | ~15ms (FTS5) |
v5 Performance
| Metric | Value |
|---|---|
| Context usage per session | ~1,100 tokens |
| Relevant context | ~1,100 tokens (100%) |
| Hook execution time | ~10ms (cached install) |
| Search latency | ~12ms (FTS5) or ~25ms (hybrid) |
| Viewer UI load time | ~50ms (bundled HTML) |
| SSE update latency | ~5ms (real-time) |
- 96% reduction in context waste
- 12x increase in relevance
- 4x faster hooks
- 33x faster search
- 78% faster hooks (smart caching)
- Real-time visualization (viewer UI)
- Better search relevance (hybrid)
- Enhanced UX (theme toggle, persistence)
Conclusion
The journey from v3 to v5 was about understanding these fundamental truths:- Context is finite - Progressive disclosure respects attention budget
- AI is the compressor - Semantic understanding beats keyword extraction
- Agents are smart - Let them decide what to fetch
- State is hard - Use SDK’s built-in mechanisms
- Graceful wins - Let processes finish cleanly
Further Reading
- Progressive Disclosure - The philosophy behind v4
- Hooks Architecture - How hooks power the system
- Context Engineering - Foundational principles
- Viewer UI - Real-time visualization (v5.1.0+)
This architecture evolution reflects hundreds of hours of experimentation, dozens of dead ends, and the invaluable experience of real-world usage. v5 is the architecture that emerged from understanding what actually works - and making it visible to users.

