> ## Documentation Index
> Fetch the complete documentation index at: https://docs.claude-mem.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Architecture Evolution

> How claude-mem evolved from v3 to v5+

# Architecture Evolution

## The Problem We Solved

**Goal:** Create a memory system that makes Claude smarter across sessions without the user noticing it exists.

**Challenge:** How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow?

This is the story of how claude-mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work.

***

## v5.x: Maturity and User Experience

After establishing the solid v4 architecture, v5.x focused on user experience, visualization, and polish.

### v5.1.2: Theme Toggle (November 2025)

**What Changed**: Added light/dark mode theme toggle to viewer UI

**New Features**:

* User-selectable theme preference (light, dark, system)
* Persistent theme settings in localStorage
* Smooth theme transitions
* System preference detection

**Implementation**:

```typescript theme={null}
// Theme context with persistence
const ThemeProvider = ({ children }) => {
  const [theme, setTheme] = useState<'light' | 'dark' | 'system'>(() => {
    return localStorage.getItem('claude-mem-theme') || 'system';
  });

  useEffect(() => {
    localStorage.setItem('claude-mem-theme', theme);
  }, [theme]);

  return (
    <ThemeContext.Provider value={{ theme, setTheme }}>
      {children}
    </ThemeContext.Provider>
  );
};
```

**Why It Matters**: Users working in different lighting conditions can now customize the viewer for comfort.

### v5.1.1: Worker Startup Fix (November 2025) - Now Deprecated

**Note**: This section describes a historical PM2-based approach that has been replaced with Bun in later versions.

**The Problem**: Worker startup failed on Windows with ENOENT error when using PM2

**Historical Solution**: Used full path to PM2 binary instead of relying on PATH

**Current Approach**: The project now uses Bun for process management, which provides better cross-platform compatibility and eliminates these PATH-related issues.

**Impact**: Cross-platform compatibility restored, Windows users can now use claude-mem without issues.

### v5.1.0: Web-Based Viewer UI (October 2025)

**The Breakthrough**: Real-time visualization of memory stream

**What We Built**:

* React-based web UI at [http://localhost:37777](http://localhost:37777)
* Server-Sent Events (SSE) for real-time updates
* Infinite scroll pagination
* Project filtering
* Settings persistence (sidebar state, selected project)
* Auto-reconnection with exponential backoff
* GPU-accelerated animations

**New Worker Endpoints** (8 additions):

```
GET /                    # Serves viewer HTML
GET /stream              # SSE real-time updates
GET /api/prompts         # Paginated user prompts
GET /api/observations    # Paginated observations
GET /api/summaries       # Paginated session summaries
GET /api/stats           # Database statistics
GET /api/settings        # User settings
POST /api/settings       # Save settings
```

**Database Enhancements**:

```typescript theme={null}
// New SessionStore methods for viewer
getRecentPrompts(limit, offset, project?)
getRecentObservations(limit, offset, project?)
getRecentSummaries(limit, offset, project?)
getStats()
getUniqueProjects()
```

**React Architecture**:

```
src/ui/viewer/
├── components/
│   ├── Header.tsx          # Navigation + stats
│   ├── Sidebar.tsx         # Project filter
│   ├── Feed.tsx            # Infinite scroll
│   └── cards/
│       ├── ObservationCard.tsx
│       ├── PromptCard.tsx
│       ├── SummaryCard.tsx
│       └── SkeletonCard.tsx
├── hooks/
│   ├── useSSE.ts           # Real-time events
│   ├── usePagination.ts    # Infinite scroll
│   ├── useSettings.ts      # Persistence
│   └── useStats.ts         # Statistics
└── utils/
    ├── merge.ts            # Data deduplication
    └── format.ts           # Display formatting
```

**Build Process**:

```typescript theme={null}
// esbuild bundles everything into single HTML file
esbuild.build({
  entryPoints: ['src/ui/viewer/index.tsx'],
  bundle: true,
  outfile: 'plugin/ui/viewer.html',
  loader: { '.tsx': 'tsx', '.woff2': 'dataurl' },
  define: { 'process.env.NODE_ENV': '"production"' },
});
```

**Why It Matters**: Users can now see exactly what's being captured in real-time, making the memory system transparent and debuggable.

### v5.0.3: Smart Install Caching (October 2025)

**The Problem**: `npm install` ran on every SessionStart (2-5 seconds)

**The Insight**: Dependencies rarely change between sessions

**The Solution**: Version-based caching

```typescript theme={null}
// Check version marker before installing
const currentVersion = getPackageVersion();
const installedVersion = readFileSync('.install-version', 'utf-8');

if (currentVersion !== installedVersion) {
  // Only install if version changed
  await runNpmInstall();
  writeFileSync('.install-version', currentVersion);
}
```

**Cached Check Logic**:

1. Does `node_modules` exist?
2. Does `.install-version` match `package.json` version?
3. Is `better-sqlite3` present? (Legacy: now uses bun:sqlite which requires no installation)

**Impact**:

* SessionStart hook: 2-5 seconds → 10ms (99.5% faster)
* Only installs on: first run, version change, missing deps
* Better Windows error messages with build tool help

### v5.0.2: Worker Health Checks (October 2025)

**What Changed**: More robust worker startup and monitoring

**New Features**:

```typescript theme={null}
// Health check endpoint
app.get('/health', (req, res) => {
  res.json({
    status: 'ok',
    uptime: process.uptime(),
    port: WORKER_PORT,
    memory: process.memoryUsage(),
  });
});

// Smart worker startup
async function ensureWorkerHealthy() {
  const healthy = await isWorkerHealthy(1000);
  if (!healthy) {
    await startWorker();
    await waitForWorkerHealth(10000);
  }
}
```

**Benefits**:

* Graceful degradation when worker is down
* Auto-recovery from crashes
* Better error messages for debugging

### v5.0.1: Stability Improvements (October 2025)

**What Changed**: Various bug fixes and stability enhancements

**Key Fixes**:

* Fixed race conditions in observation queue processing
* Improved error handling in SDK worker
* Better cleanup of stale worker processes
* Enhanced logging for debugging

### v5.0.0: Hybrid Search Architecture (October 2025)

**The Evolution**: SQLite FTS5 + Chroma vector search

**What We Added**:

```
┌─────────────────────────────────────────────────────────┐
│                    HYBRID SEARCH                         │
│                                                          │
│  Text Query → SQLite FTS5 (keyword matching)            │
│                      ↓                                   │
│            Chroma Vector Search (semantic)               │
│                      ↓                                   │
│              Merge + Re-rank Results                     │
└─────────────────────────────────────────────────────────┘
```

**New Dependencies**:

* `chromadb` - Vector database for semantic search
* Python 3.8+ - Required by chromadb

**MCP Tools Enhancement**:

```typescript theme={null}
// Chroma-backed semantic search
search_observations({
  query: "authentication bug",
  useSemanticSearch: true  // Uses Chroma
});

// Falls back to FTS5 if Chroma unavailable
```

**Why Hybrid**:

* FTS5: Fast keyword matching, no dependencies
* Chroma: Semantic understanding, finds related concepts
* Graceful degradation: Works without Chroma (FTS5 only)

**Trade-offs**:

* Added Python dependency (optional)
* Increased installation complexity
* Better search relevance

***

## MCP Architecture Simplification (December 2025)

### The Problem: Complex MCP Implementation

**Before:**

```
9+ MCP tools registered at session start:
- search_observations
- find_by_type
- find_by_file
- find_by_concept
- get_recent_context
- get_observation
- get_session
- get_prompt
- help

Problems:
- Overlapping operations (search_observations vs find_by_type)
- Complex parameter schemas (~2,500 tokens in tool definitions)
- No built-in workflow guidance
- High cognitive load for Claude (which tool to use?)
- Code size: ~2,718 lines in mcp-server.ts
```

**The Insight:** Progressive disclosure should be built into tool design itself, not something Claude has to remember.

### The Solution: 3-Layer Workflow

**After:**

```
4 MCP tools following 3-layer workflow:

1. __IMPORTANT - Workflow documentation (always visible)
   "3-LAYER WORKFLOW (ALWAYS FOLLOW):
    1. search(query) → Get index with IDs
    2. timeline(anchor=ID) → Get context
    3. get_observations([IDs]) → Fetch details
    NEVER fetch full details without filtering first."

2. search - Layer 1: Get index with IDs (~50-100 tokens/result)
3. timeline - Layer 2: Get chronological context
4. get_observations - Layer 3: Fetch full details (~500-1,000 tokens/result)

Benefits:
- Progressive disclosure enforced by tool structure
- No overlapping operations
- Simple schemas (additionalProperties: true)
- Clear workflow pattern
- Code size: ~312 lines in mcp-server.ts (88% reduction)
- ~10x token savings
```

### Migration: Skill-Based Search Removed

**Previously:** Used skill-based search

* mem-search skill invoked via natural language
* HTTP API called directly via curl
* Progressive disclosure through skill loading
* 17 skill documentation files

**Now:** Removed skill-based approach

* MCP-only architecture
* Native MCP protocol (better Claude integration)
* Works with both Claude Desktop and Claude Code
* Simpler to maintain (no skill files)
* All 19 mem-search skill files removed (\~2,744 lines)

### Key Architectural Changes

**MCP Server Refactor:**

Before:

```typescript theme={null}
// Complex parameter schemas
{
  name: "search_observations",
  inputSchema: {
    type: "object",
    properties: {
      query: { type: "string", description: "..." },
      type: { type: "array", items: { enum: [...] } },
      format: { enum: ["index", "full"] },
      limit: { type: "number", minimum: 1, maximum: 100 },
      // ... many more parameters
    }
  }
}
```

After:

```typescript theme={null}
// Simple schemas with workflow guidance
{
  name: "search",
  description: "Step 1: Search memory. Returns index with IDs.",
  inputSchema: {
    type: "object",
    properties: {},
    additionalProperties: true  // Accept any parameters
  }
}
```

**Workflow Enforcement:**

Before: Claude had to remember progressive disclosure pattern

After: Tool structure makes it impossible to skip steps

* Can't get details without IDs from search
* Can't search without seeing \_\_IMPORTANT reminder
* Timeline provides middle ground (context without full details)

### Impact

**Token Efficiency:**

```
Traditional: Fetch 20 observations upfront
→ 10,000-20,000 tokens
→ Only 2 observations relevant (90% waste)

3-Layer Workflow:
→ search (20 results): ~1,000-2,000 tokens
→ Review index, identify 3 relevant IDs
→ get_observations (3 IDs): ~1,500-3,000 tokens
→ Total: 2,500-5,000 tokens (50-75% savings)
```

**Code Simplicity:**

* MCP server: 2,718 lines → 312 lines (88% reduction)
* Removed: 19 skill files (\~2,744 lines)
* Net reduction: \~5,150 lines of code removed

**User Experience:**

* Same natural language interaction
* Better token efficiency
* Clearer architecture
* Works identically on Claude Desktop and Claude Code

### Design Philosophy

**Progressive Disclosure Through Structure:**

The 3-layer workflow embodies progressive disclosure at the architectural level:

1. **Layer 1 (Index)** - "What exists?" - Cheap survey of options
2. **Layer 2 (Timeline)** - "What was happening?" - Context around specific points
3. **Layer 3 (Details)** - "Tell me everything" - Full details only when justified

Each layer provides a decision point where Claude can:

* Stop if irrelevant
* Get more context if uncertain
* Dive deep if confident

This makes it structurally difficult to waste tokens.

***

## v1-v2: The Naive Approach

### The First Attempt: Dump Everything

**Architecture:**

```
PostToolUse Hook → Save raw tool outputs → Retrieve everything on startup
```

**What we learned:**

* ❌ Context pollution (thousands of tokens of irrelevant data)
* ❌ No compression (raw tool outputs are verbose)
* ❌ No search (had to scan everything linearly)
* ✅ Proved the concept: Memory across sessions is valuable

**Example of what went wrong:**

```
SessionStart loaded:
- 150 file read operations
- 80 grep searches
- 45 bash commands
- Total: ~35,000 tokens
- Relevant to current task: ~500 tokens (1.4%)
```

***

## v3: Smart Compression, Wrong Architecture

### The Breakthrough: AI-Powered Compression

**New idea:** Use Claude itself to compress observations

**Architecture:**

```
PostToolUse Hook → Queue observation → SDK Worker → AI compression → Store insights
```

**What we added:**

1. **Claude Agent SDK integration** - Use AI to compress observations
2. **Background worker** - Don't block main session
3. **Structured observations** - Extract facts, decisions, insights
4. **Session summaries** - Generate comprehensive summaries

**What worked:**

* ✅ Compression ratio: 10:1 to 100:1
* ✅ Semantic understanding (not just keyword matching)
* ✅ Background processing (hooks stayed fast)
* ✅ Search became useful

**What didn't work:**

* ❌ Still loaded everything upfront
* ❌ Session ID management was broken
* ❌ Aggressive cleanup interrupted summaries
* ❌ Multiple SDK sessions per Claude Code session

***

## The Key Realizations

### Realization 1: Progressive Disclosure

**Problem:** Even compressed observations can pollute context if you load them all.

**Insight:** Humans don't read everything before starting work. Why should AI?

**Solution:** Show an index first, fetch details on-demand.

```
❌ Old: Load 50 observations (8,500 tokens)
✅ New: Show index of 50 observations (800 tokens)
        Agent fetches 2-3 relevant ones (300 tokens)
        Total: 1,100 tokens vs 8,500 tokens
```

**Impact:**

* 87% reduction in context usage
* 100% relevance (only fetch what's needed)
* Agent autonomy (decides what's relevant)

### Realization 2: Session ID Chaos

**Problem:** SDK session IDs change on every turn.

**What we thought:**

```typescript theme={null}
// ❌ Wrong assumption
UserPromptSubmit → Capture session ID once → Use forever
```

**Reality:**

```typescript theme={null}
// ✅ Actual behavior
Turn 1: session_abc123
Turn 2: session_def456
Turn 3: session_ghi789
```

**Why this matters:**

* Can't resume sessions without tracking ID updates
* Session state gets lost between turns
* Observations get orphaned

**Solution:**

```typescript theme={null}
// Capture from system init message
for await (const msg of response) {
  if (msg.type === 'system' && msg.subtype === 'init') {
    sdkSessionId = msg.session_id;
    await updateSessionId(sessionId, sdkSessionId);
  }
}
```

### Realization 3: Graceful vs Aggressive Cleanup

**v3 approach:**

```typescript theme={null}
// ❌ Aggressive: Kill worker immediately
SessionEnd → DELETE /worker/session → Worker stops
```

**Problems:**

* Summary generation interrupted mid-process
* Pending observations lost
* Race conditions everywhere

**v4 approach:**

```typescript theme={null}
// ✅ Graceful: Let worker finish
SessionEnd → Mark session complete → Worker finishes → Exit naturally
```

**Benefits:**

* Summaries complete successfully
* No lost observations
* Clean state transitions

**Code:**

```typescript theme={null}
// v3: Aggressive
async function sessionEnd(sessionId: string) {
  await fetch(`http://localhost:37777/sessions/${sessionId}`, {
    method: 'DELETE'
  });
}

// v4: Graceful
async function sessionEnd(sessionId: string) {
  await db.run(
    'UPDATE sdk_sessions SET completed_at = ? WHERE id = ?',
    [Date.now(), sessionId]
  );
}
```

### Realization 4: One Session, Not Many

**Problem:** We were creating multiple SDK sessions per Claude Code session.

**What we thought:**

```
Claude Code session → Create SDK session per observation → 100+ SDK sessions
```

**Reality should be:**

```
Claude Code session → ONE long-running SDK session → Streaming input
```

**Why this matters:**

* SDK maintains conversation state
* Context accumulates naturally
* Much more efficient

**Implementation:**

```typescript theme={null}
// ✅ Streaming Input Mode
async function* messageGenerator(): AsyncIterable<UserMessage> {
  // Initial prompt
  yield {
    role: "user",
    content: "You are a memory assistant..."
  };

  // Then continuously yield observations
  while (session.status === 'active') {
    const observations = await pollQueue();
    for (const obs of observations) {
      yield {
        role: "user",
        content: formatObservation(obs)
      };
    }
    await sleep(1000);
  }
}

const response = query({
  prompt: messageGenerator(),
  options: { maxTurns: 1000 }
});
```

***

## v4: The Architecture That Works

### The Core Design

```
┌─────────────────────────────────────────────────────────┐
│              CLAUDE CODE SESSION                         │
│  User → Claude → Tools (Read, Edit, Write, Bash)        │
│                    ↓                                     │
│              PostToolUse Hook                            │
│              (queues observation)                        │
└─────────────────────────────────────────────────────────┘
                     ↓ SQLite queue
┌─────────────────────────────────────────────────────────┐
│              SDK WORKER PROCESS                          │
│  ONE streaming session per Claude Code session          │
│                                                          │
│  AsyncIterable<UserMessage>                             │
│    → Yields observations from queue                     │
│    → SDK compresses via AI                              │
│    → Parses XML responses                               │
│    → Stores in database                                 │
└─────────────────────────────────────────────────────────┘
                     ↓ SQLite storage
┌─────────────────────────────────────────────────────────┐
│              NEXT SESSION                                │
│  SessionStart Hook                                       │
│    → Queries database                                    │
│    → Returns progressive disclosure index               │
│    → Agent fetches details via MCP                      │
└─────────────────────────────────────────────────────────┘
```

### The Five Hook Architecture

<Tabs>
  <Tab title="SessionStart">
    **Purpose:** Inject context from previous sessions

    **Timing:** When Claude Code starts

    **What it does:**

    * Queries last 10 session summaries
    * Formats as progressive disclosure index
    * Injects into context via stdout

    **Key change from v3:**

    * ✅ Index format (not full details)
    * ✅ Token counts visible
    * ✅ MCP search instructions included
  </Tab>

  <Tab title="UserPromptSubmit">
    **Purpose:** Initialize session tracking

    **Timing:** Before Claude processes prompt

    **What it does:**

    * Creates session record
    * Saves raw user prompt (v4.2.0+)
    * Starts worker if needed

    **Key change from v3:**

    * ✅ Stores raw prompts for search
    * ✅ Auto-starts worker service
  </Tab>

  <Tab title="PostToolUse">
    **Purpose:** Capture tool observations

    **Timing:** After every tool execution

    **What it does:**

    * Enqueues observation in database
    * Returns immediately

    **Key change from v3:**

    * ✅ Just enqueues (doesn't process)
    * ✅ Worker handles all AI calls
  </Tab>

  <Tab title="Summary">
    **Purpose:** Generate session summaries

    **Timing:** Worker-triggered (mid-session)

    **What it does:**

    * Gathers observations
    * Sends to Claude for summarization
    * Stores structured summary

    **Key change from v3:**

    * ✅ Multiple summaries per session
    * ✅ Summaries are checkpoints, not endings
  </Tab>

  <Tab title="SessionEnd">
    **Purpose:** Graceful cleanup

    **Timing:** When session ends

    **What it does:**

    * Marks session complete
    * Lets worker finish processing

    **Key change from v3:**

    * ✅ Graceful (not aggressive)
    * ✅ No DELETE requests
    * ✅ Worker finishes naturally
  </Tab>
</Tabs>

### Database Schema Evolution

**v3 schema:**

```sql theme={null}
-- Simple, flat structure
CREATE TABLE observations (
  id INTEGER PRIMARY KEY,
  session_id TEXT,
  text TEXT,
  created_at INTEGER
);
```

**v4 schema:**

```sql theme={null}
-- Rich, structured schema
CREATE TABLE observations (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  session_id TEXT NOT NULL,
  project TEXT NOT NULL,

  -- Progressive disclosure metadata
  title TEXT NOT NULL,
  subtitle TEXT,
  type TEXT NOT NULL,  -- decision, bugfix, feature, etc.

  -- Content
  narrative TEXT NOT NULL,
  facts TEXT,  -- JSON array

  -- Searchability
  concepts TEXT,  -- JSON array of tags
  files_read TEXT,  -- JSON array
  files_modified TEXT,  -- JSON array

  -- Timestamps
  created_at TEXT NOT NULL,
  created_at_epoch INTEGER NOT NULL,

  FOREIGN KEY(session_id) REFERENCES sdk_sessions(id)
);

-- FTS5 for full-text search
CREATE VIRTUAL TABLE observations_fts USING fts5(
  title, subtitle, narrative, facts, concepts,
  content=observations
);

-- Auto-sync triggers
CREATE TRIGGER observations_ai AFTER INSERT ON observations BEGIN
  INSERT INTO observations_fts(rowid, title, subtitle, narrative, facts, concepts)
  VALUES (new.id, new.title, new.subtitle, new.narrative, new.facts, new.concepts);
END;
```

**What changed:**

* ✅ Structured fields (title, subtitle, type)
* ✅ FTS5 full-text search
* ✅ Project-scoped queries
* ✅ Rich metadata for progressive disclosure

### Worker Service Redesign

**v3 worker:**

```typescript theme={null}
// Multiple short SDK sessions
app.post('/process', async (req, res) => {
  const response = await query({
    prompt: buildPrompt(req.body),
    options: { maxTurns: 1 }
  });

  for await (const msg of response) {
    // Process single observation
  }

  res.json({ success: true });
});
```

**v4 worker:**

```typescript theme={null}
// ONE long-running SDK session
async function runWorker(sessionId: string) {
  const response = query({
    prompt: messageGenerator(),  // AsyncIterable
    options: { maxTurns: 1000 }
  });

  for await (const msg of response) {
    if (msg.type === 'text') {
      parseObservations(msg.content);
      parseSummaries(msg.content);
    }
  }
}
```

**Benefits:**

* Maintains conversation state
* SDK handles context automatically
* More efficient (fewer API calls)
* Natural multi-turn flow

***

## Critical Fixes Along the Way

### Fix 1: Context Injection Pollution (v4.3.1)

**Problem:** SessionStart hook output polluted with npm install logs

```bash theme={null}
# Hook output contained:
npm WARN deprecated ...
npm WARN deprecated ...
{"hookSpecificOutput": {"additionalContext": "..."}}
```

**Why it broke:**

* Claude Code expects clean JSON or plain text
* stderr/stdout from npm install mixed with hook output
* Context didn't inject properly

**Solution:**

```json theme={null}
{
  "command": "npm install --loglevel=silent && node context-hook.js"
}
```

**Result:** Clean JSON output, context injection works

### Fix 2: Double Shebang Issue (v4.3.1)

**Problem:** Hook executables had duplicate shebangs

```javascript theme={null}
#!/usr/bin/env node
#!/usr/bin/env node  // ← Duplicate!

// Rest of code...
```

**Why it happened:**

* Source files had shebang
* esbuild added another shebang during build

**Solution:**

```typescript theme={null}
// Remove shebangs from source files
// Let esbuild add them during build
```

**Result:** Clean executables, no parsing errors

### Fix 3: FTS5 Injection Vulnerability (v4.2.3)

**Problem:** User input passed directly to FTS5 query

```typescript theme={null}
// ❌ Vulnerable
const results = db.query(
  `SELECT * FROM observations_fts WHERE observations_fts MATCH '${userQuery}'`
);
```

**Attack:**

```typescript theme={null}
userQuery = "'; DROP TABLE observations; --"
```

**Solution:**

```typescript theme={null}
// ✅ Safe: Use parameterized queries
const results = db.query(
  'SELECT * FROM observations_fts WHERE observations_fts MATCH ?',
  [userQuery]
);
```

### Fix 4: NOT NULL Constraint Violation (v4.2.8)

**Problem:** Session creation failed when prompt was empty

```sql theme={null}
INSERT INTO sdk_sessions (claude_session_id, user_prompt, ...)
VALUES ('abc123', NULL, ...)  -- ❌ user_prompt is NOT NULL
```

**Solution:**

```typescript theme={null}
// Allow NULL user_prompts
user_prompt: input.prompt ?? null
```

**Schema change:**

```sql theme={null}
-- Before
user_prompt TEXT NOT NULL

-- After
user_prompt TEXT  -- Nullable
```

***

## Performance Improvements

### Optimization 1: Prepared Statements

**Before:**

```typescript theme={null}
for (const obs of observations) {
  db.run(`INSERT INTO observations (...) VALUES (?, ?, ...)`, [obs.id, obs.text, ...]);
}
```

**After:**

```typescript theme={null}
const stmt = db.prepare(`INSERT INTO observations (...) VALUES (?, ?, ...)`);
for (const obs of observations) {
  stmt.run([obs.id, obs.text, ...]);
}
stmt.finalize();
```

**Impact:** 5x faster bulk inserts

### Optimization 2: FTS5 Indexing

**Before:**

```typescript theme={null}
// Manual full-text search
const results = db.query(
  `SELECT * FROM observations WHERE text LIKE '%${query}%'`
);
```

**After:**

```typescript theme={null}
// FTS5 virtual table
const results = db.query(
  `SELECT * FROM observations_fts WHERE observations_fts MATCH ?`,
  [query]
);
```

**Impact:** 100x faster searches on large datasets

### Optimization 3: Index Format Default

**Before:**

```typescript theme={null}
// Always return full observations
search_observations({ query: "hooks" });
// Returns: 5,000 tokens
```

**After:**

```typescript theme={null}
// Default to index format
search_observations({ query: "hooks", format: "index" });
// Returns: 200 tokens

// Fetch full only when needed
search_observations({ query: "hooks", format: "full", limit: 1 });
// Returns: 150 tokens
```

**Impact:** 25x reduction in average search result size

***

## What We Learned

### Lesson 1: Context is Precious

**Principle:** Every token you put in context window costs attention.

**Application:**

* Progressive disclosure reduces waste by 87%
* Index-first approach gives agent control
* Token counts make costs visible

### Lesson 2: Session State is Complicated

**Principle:** Distributed state is hard. SDK handles it better than we can.

**Application:**

* Use SDK's built-in session resumption
* Don't try to manually reconstruct state
* Track session IDs from init messages

### Lesson 3: Graceful Beats Aggressive

**Principle:** Let processes finish their work before terminating.

**Application:**

* Graceful cleanup prevents data loss
* Workers finish important operations
* Clean state transitions reduce bugs

### Lesson 4: AI is the Compressor

**Principle:** Don't compress manually. Let AI do semantic compression.

**Application:**

* 10:1 to 100:1 compression ratios
* Semantic understanding, not keyword extraction
* Structured outputs (XML parsing)

### Lesson 5: Progressive Everything

**Principle:** Show metadata first, fetch details on-demand.

**Application:**

* Progressive disclosure in context injection
* Index format in search results
* Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details)

***

## The Road Ahead

### Planned: Adaptive Index Size

```typescript theme={null}
SessionStart({ source: "startup" }):
  → Show last 10 sessions (normal)

SessionStart({ source: "resume" }):
  → Show only current session (minimal)

SessionStart({ source: "compact" }):
  → Show last 20 sessions (comprehensive)
```

### Planned: Relevance Scoring

```typescript theme={null}
// Use embeddings to pre-sort index by semantic relevance
search_observations({
  query: "authentication bug",
  sort: "relevance"  // Based on embeddings
});
```

### Planned: Multi-Project Context

```typescript theme={null}
// Cross-project pattern recognition
search_observations({
  query: "API rate limiting",
  projects: ["api-gateway", "user-service", "billing-service"]
});
```

### Planned: Collaborative Memory

```typescript theme={null}
// Team-shared observations (optional)
createObservation({
  title: "Rate limit: 100 req/min",
  scope: "team"  // vs "user"
});
```

***

## Migration Guide: v3 → v5

### Step 1: Backup Database

```bash theme={null}
cp ~/.claude-mem/claude-mem.db ~/.claude-mem/claude-mem-v3-backup.db
```

### Step 2: Update Plugin

```bash theme={null}
cd ~/.claude/plugins/marketplaces/thedotmack
git pull
```

### Step 3: Update Plugin

```bash theme={null}
/plugin update claude-mem
```

**What happens automatically:**

* Dependencies update (including new ones like chromadb for v5.0.0+)
* Database schema migrations run automatically
* Worker service restarts with new code
* Smart install caching activates (v5.0.3+)

### Step 4: Test

```bash theme={null}
# Start Claude Code
claude

# Check that context is injected
# (Should see progressive disclosure index with v5 viewer link)

# Open viewer UI (v5.1.0+)
open http://localhost:37777

# Submit a prompt and watch real-time updates in viewer
```

### Step 5: Explore New Features

```bash theme={null}
# View memory stream in browser (v5.1.0+)
open http://localhost:37777

# Toggle theme (v5.1.2+)
# Click theme button in viewer header

# Check worker health
npm run worker:status
curl http://localhost:37777/health
```

***

## Key Metrics

### v3 Performance

| Metric                    | Value                  |
| ------------------------- | ---------------------- |
| Context usage per session | \~25,000 tokens        |
| Relevant context          | \~2,000 tokens (8%)    |
| Hook execution time       | \~200ms                |
| Search latency            | \~500ms (LIKE queries) |

### v4 Performance

| Metric                    | Value                 |
| ------------------------- | --------------------- |
| Context usage per session | \~1,100 tokens        |
| Relevant context          | \~1,100 tokens (100%) |
| Hook execution time       | \~45ms                |
| Search latency            | \~15ms (FTS5)         |

### v5 Performance

| Metric                    | Value                            |
| ------------------------- | -------------------------------- |
| Context usage per session | \~1,100 tokens                   |
| Relevant context          | \~1,100 tokens (100%)            |
| Hook execution time       | \~10ms (cached install)          |
| Search latency            | \~12ms (FTS5) or \~25ms (hybrid) |
| Viewer UI load time       | \~50ms (bundled HTML)            |
| SSE update latency        | \~5ms (real-time)                |

**v3 → v4 Improvements:**

* 96% reduction in context waste
* 12x increase in relevance
* 4x faster hooks
* 33x faster search

**v4 → v5 Improvements:**

* 78% faster hooks (smart caching)
* Real-time visualization (viewer UI)
* Better search relevance (hybrid)
* Enhanced UX (theme toggle, persistence)

***

## Conclusion

The journey from v3 to v5 was about understanding these fundamental truths:

1. **Context is finite** - Progressive disclosure respects attention budget
2. **AI is the compressor** - Semantic understanding beats keyword extraction
3. **Agents are smart** - Let them decide what to fetch
4. **State is hard** - Use SDK's built-in mechanisms
5. **Graceful wins** - Let processes finish cleanly

The result is a memory system that's both powerful and invisible. Users never notice it working - Claude just gets smarter over time.

**v5 adds visibility**: Now users CAN see the memory system working if they want (via viewer UI), but it's still non-intrusive.

***

## Further Reading

* [Progressive Disclosure](progressive-disclosure) - The philosophy behind v4
* [Hooks Architecture](hooks-architecture) - How hooks power the system
* [Context Engineering](context-engineering) - Foundational principles
* [Worker Service](/architecture/worker-service) - Real-time visualization (v5.1.0+)

***

*This architecture evolution reflects hundreds of hours of experimentation, dozens of dead ends, and the invaluable experience of real-world usage. v5 is the architecture that emerged from understanding what actually works - and making it visible to users.*
