Architecture Evolution: The Journey from v3 to v5

The Problem We Solved

Goal: Create a memory system that makes Claude smarter across sessions without the user noticing it exists. Challenge: How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow? This is the story of how claude-mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work.

v5.x: Maturity and User Experience

After establishing the solid v4 architecture, v5.x focused on user experience, visualization, and polish.

v5.1.2: Theme Toggle (November 2025)

What Changed: Added light/dark mode theme toggle to viewer UI New Features:

User-selectable theme preference (light, dark, system)
Persistent theme settings in localStorage
Smooth theme transitions
System preference detection

Implementation:

// Theme context with persistence
const ThemeProvider = ({ children }) => {
  const [theme, setTheme] = useState<'light' | 'dark' | 'system'>(() => {
    return localStorage.getItem('claude-mem-theme') || 'system';
  });

  useEffect(() => {
    localStorage.setItem('claude-mem-theme', theme);
  }, [theme]);

  return (
    <ThemeContext.Provider value={{ theme, setTheme }}>
      {children}
    </ThemeContext.Provider>
  );
};

Why It Matters: Users working in different lighting conditions can now customize the viewer for comfort.

v5.1.1: PM2 Windows Fix (November 2025)

The Problem: PM2 startup failed on Windows with ENOENT error Root Cause:

// ❌ Failed on Windows - PM2 not in PATH
execSync('pm2 start ecosystem.config.cjs');

The Fix:

// ✅ Use full path to PM2 binary
const PM2_PATH = join(PLUGIN_ROOT, 'node_modules', '.bin', 'pm2');
execSync(`"${PM2_PATH}" start "${ECOSYSTEM_CONFIG}"`);

Impact: Cross-platform compatibility restored, Windows users can now use claude-mem without issues.

v5.1.0: Web-Based Viewer UI (October 2025)

The Breakthrough: Real-time visualization of memory stream What We Built:

React-based web UI at http://localhost:37777
Server-Sent Events (SSE) for real-time updates
Infinite scroll pagination
Project filtering
Settings persistence (sidebar state, selected project)
Auto-reconnection with exponential backoff
GPU-accelerated animations

New Worker Endpoints (8 additions):

GET /                    # Serves viewer HTML
GET /stream              # SSE real-time updates
GET /api/prompts         # Paginated user prompts
GET /api/observations    # Paginated observations
GET /api/summaries       # Paginated session summaries
GET /api/stats           # Database statistics
GET /api/settings        # User settings
POST /api/settings       # Save settings

Database Enhancements:

// New SessionStore methods for viewer
getRecentPrompts(limit, offset, project?)
getRecentObservations(limit, offset, project?)
getRecentSummaries(limit, offset, project?)
getStats()
getUniqueProjects()

React Architecture:

src/ui/viewer/
├── components/
│   ├── Header.tsx          # Navigation + stats
│   ├── Sidebar.tsx         # Project filter
│   ├── Feed.tsx            # Infinite scroll
│   └── cards/
│       ├── ObservationCard.tsx
│       ├── PromptCard.tsx
│       ├── SummaryCard.tsx
│       └── SkeletonCard.tsx
├── hooks/
│   ├── useSSE.ts           # Real-time events
│   ├── usePagination.ts    # Infinite scroll
│   ├── useSettings.ts      # Persistence
│   └── useStats.ts         # Statistics
└── utils/
    ├── merge.ts            # Data deduplication
    └── format.ts           # Display formatting

Build Process:

// esbuild bundles everything into single HTML file
esbuild.build({
  entryPoints: ['src/ui/viewer/index.tsx'],
  bundle: true,
  outfile: 'plugin/ui/viewer.html',
  loader: { '.tsx': 'tsx', '.woff2': 'dataurl' },
  define: { 'process.env.NODE_ENV': '"production"' },
});

Why It Matters: Users can now see exactly what’s being captured in real-time, making the memory system transparent and debuggable.

v5.0.3: Smart Install Caching (October 2025)

The Problem: npm install ran on every SessionStart (2-5 seconds) The Insight: Dependencies rarely change between sessions The Solution: Version-based caching

// Check version marker before installing
const currentVersion = getPackageVersion();
const installedVersion = readFileSync('.install-version', 'utf-8');

if (currentVersion !== installedVersion) {
  // Only install if version changed
  await runNpmInstall();
  writeFileSync('.install-version', currentVersion);
}

Cached Check Logic:

Does node_modules exist?
Does .install-version match package.json version?
Is better-sqlite3 present?

Impact:

SessionStart hook: 2-5 seconds → 10ms (99.5% faster)
Only installs on: first run, version change, missing deps
Better Windows error messages with build tool help

v5.0.2: Worker Health Checks (October 2025)

What Changed: More robust worker startup and monitoring New Features:

// Health check endpoint
app.get('/health', (req, res) => {
  res.json({
    status: 'ok',
    uptime: process.uptime(),
    port: WORKER_PORT,
    memory: process.memoryUsage(),
  });
});

// Smart worker startup
async function ensureWorkerHealthy() {
  const healthy = await isWorkerHealthy(1000);
  if (!healthy) {
    await startWorker();
    await waitForWorkerHealth(10000);
  }
}

Benefits:

Graceful degradation when worker is down
Auto-recovery from crashes
Better error messages for debugging

v5.0.1: Stability Improvements (October 2025)

What Changed: Various bug fixes and stability enhancements Key Fixes:

Fixed race conditions in observation queue processing
Improved error handling in SDK worker
Better cleanup of stale PM2 processes
Enhanced logging for debugging

v5.0.0: Hybrid Search Architecture (October 2025)

The Evolution: SQLite FTS5 + Chroma vector search What We Added:

┌─────────────────────────────────────────────────────────┐
│                    HYBRID SEARCH                         │
│                                                          │
│  Text Query → SQLite FTS5 (keyword matching)            │
│                      ↓                                   │
│            Chroma Vector Search (semantic)               │
│                      ↓                                   │
│              Merge + Re-rank Results                     │
└─────────────────────────────────────────────────────────┘

New Dependencies:

chromadb - Vector database for semantic search
Python 3.8+ - Required by chromadb

MCP Tools Enhancement:

// Chroma-backed semantic search
search_observations({
  query: "authentication bug",
  useSemanticSearch: true  // Uses Chroma
});

// Falls back to FTS5 if Chroma unavailable

Why Hybrid:

FTS5: Fast keyword matching, no dependencies
Chroma: Semantic understanding, finds related concepts
Graceful degradation: Works without Chroma (FTS5 only)

Trade-offs:

Added Python dependency (optional)
Increased installation complexity
Better search relevance

v1-v2: The Naive Approach

The First Attempt: Dump Everything

Architecture:

PostToolUse Hook → Save raw tool outputs → Retrieve everything on startup

What we learned:

❌ Context pollution (thousands of tokens of irrelevant data)
❌ No compression (raw tool outputs are verbose)
❌ No search (had to scan everything linearly)
✅ Proved the concept: Memory across sessions is valuable

Example of what went wrong:

SessionStart loaded:
- 150 file read operations
- 80 grep searches
- 45 bash commands
- Total: ~35,000 tokens
- Relevant to current task: ~500 tokens (1.4%)

v3: Smart Compression, Wrong Architecture

The Breakthrough: AI-Powered Compression

New idea: Use Claude itself to compress observations Architecture:

PostToolUse Hook → Queue observation → SDK Worker → AI compression → Store insights

What we added:

Claude Agent SDK integration - Use AI to compress observations
Background worker - Don’t block main session
Structured observations - Extract facts, decisions, insights
Session summaries - Generate comprehensive summaries

What worked:

✅ Compression ratio: 10:1 to 100:1
✅ Semantic understanding (not just keyword matching)
✅ Background processing (hooks stayed fast)
✅ Search became useful

What didn’t work:

❌ Still loaded everything upfront
❌ Session ID management was broken
❌ Aggressive cleanup interrupted summaries
❌ Multiple SDK sessions per Claude Code session

The Key Realizations

Realization 1: Progressive Disclosure

Problem: Even compressed observations can pollute context if you load them all. Insight: Humans don’t read everything before starting work. Why should AI? Solution: Show an index first, fetch details on-demand.

❌ Old: Load 50 observations (8,500 tokens)
✅ New: Show index of 50 observations (800 tokens)
        Agent fetches 2-3 relevant ones (300 tokens)
        Total: 1,100 tokens vs 8,500 tokens

Impact:

87% reduction in context usage
100% relevance (only fetch what’s needed)
Agent autonomy (decides what’s relevant)

Realization 2: Session ID Chaos

Problem: SDK session IDs change on every turn. What we thought:

// ❌ Wrong assumption
UserPromptSubmit → Capture session ID once → Use forever

Reality:

// ✅ Actual behavior
Turn 1: session_abc123
Turn 2: session_def456
Turn 3: session_ghi789

Why this matters:

Can’t resume sessions without tracking ID updates
Session state gets lost between turns
Observations get orphaned

Solution:

// Capture from system init message
for await (const msg of response) {
  if (msg.type === 'system' && msg.subtype === 'init') {
    sdkSessionId = msg.session_id;
    await updateSessionId(sessionId, sdkSessionId);
  }
}

Realization 3: Graceful vs Aggressive Cleanup

v3 approach:

// ❌ Aggressive: Kill worker immediately
SessionEnd → DELETE /worker/session → Worker stops

Problems:

Summary generation interrupted mid-process
Pending observations lost
Race conditions everywhere

v4 approach:

// ✅ Graceful: Let worker finish
SessionEnd → Mark session complete → Worker finishes → Exit naturally

Benefits:

Summaries complete successfully
No lost observations
Clean state transitions

Code:

// v3: Aggressive
async function sessionEnd(sessionId: string) {
  await fetch(`http://localhost:37777/sessions/${sessionId}`, {
    method: 'DELETE'
  });
}

// v4: Graceful
async function sessionEnd(sessionId: string) {
  await db.run(
    'UPDATE sdk_sessions SET completed_at = ? WHERE id = ?',
    [Date.now(), sessionId]
  );
}

Realization 4: One Session, Not Many

Problem: We were creating multiple SDK sessions per Claude Code session. What we thought:

Claude Code session → Create SDK session per observation → 100+ SDK sessions

Reality should be:

Claude Code session → ONE long-running SDK session → Streaming input

Why this matters:

SDK maintains conversation state
Context accumulates naturally
Much more efficient

Implementation:

// ✅ Streaming Input Mode
async function* messageGenerator(): AsyncIterable<UserMessage> {
  // Initial prompt
  yield {
    role: "user",
    content: "You are a memory assistant..."
  };

  // Then continuously yield observations
  while (session.status === 'active') {
    const observations = await pollQueue();
    for (const obs of observations) {
      yield {
        role: "user",
        content: formatObservation(obs)
      };
    }
    await sleep(1000);
  }
}

const response = query({
  prompt: messageGenerator(),
  options: { maxTurns: 1000 }
});

v4: The Architecture That Works

The Core Design

┌─────────────────────────────────────────────────────────┐
│              CLAUDE CODE SESSION                         │
│  User → Claude → Tools (Read, Edit, Write, Bash)        │
│                    ↓                                     │
│              PostToolUse Hook                            │
│              (queues observation)                        │
└─────────────────────────────────────────────────────────┘
                     ↓ SQLite queue
┌─────────────────────────────────────────────────────────┐
│              SDK WORKER PROCESS                          │
│  ONE streaming session per Claude Code session          │
│                                                          │
│  AsyncIterable<UserMessage>                             │
│    → Yields observations from queue                     │
│    → SDK compresses via AI                              │
│    → Parses XML responses                               │
│    → Stores in database                                 │
└─────────────────────────────────────────────────────────┘
                     ↓ SQLite storage
┌─────────────────────────────────────────────────────────┐
│              NEXT SESSION                                │
│  SessionStart Hook                                       │
│    → Queries database                                    │
│    → Returns progressive disclosure index               │
│    → Agent fetches details via MCP                      │
└─────────────────────────────────────────────────────────┘

The Five Hook Architecture

SessionStart
UserPromptSubmit
PostToolUse
Summary
SessionEnd

Purpose: Inject context from previous sessionsTiming: When Claude Code startsWhat it does:

Queries last 10 session summaries
Formats as progressive disclosure index
Injects into context via stdout

Key change from v3:

✅ Index format (not full details)
✅ Token counts visible
✅ MCP search instructions included

Database Schema Evolution

v3 schema:

-- Simple, flat structure
CREATE TABLE observations (
  id INTEGER PRIMARY KEY,
  session_id TEXT,
  text TEXT,
  created_at INTEGER
);

v4 schema:

-- Rich, structured schema
CREATE TABLE observations (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  session_id TEXT NOT NULL,
  project TEXT NOT NULL,

  -- Progressive disclosure metadata
  title TEXT NOT NULL,
  subtitle TEXT,
  type TEXT NOT NULL,  -- decision, bugfix, feature, etc.

  -- Content
  narrative TEXT NOT NULL,
  facts TEXT,  -- JSON array

  -- Searchability
  concepts TEXT,  -- JSON array of tags
  files_read TEXT,  -- JSON array
  files_modified TEXT,  -- JSON array

  -- Timestamps
  created_at TEXT NOT NULL,
  created_at_epoch INTEGER NOT NULL,

  FOREIGN KEY(session_id) REFERENCES sdk_sessions(id)
);

-- FTS5 for full-text search
CREATE VIRTUAL TABLE observations_fts USING fts5(
  title, subtitle, narrative, facts, concepts,
  content=observations
);

-- Auto-sync triggers
CREATE TRIGGER observations_ai AFTER INSERT ON observations BEGIN
  INSERT INTO observations_fts(rowid, title, subtitle, narrative, facts, concepts)
  VALUES (new.id, new.title, new.subtitle, new.narrative, new.facts, new.concepts);
END;

What changed:

✅ Structured fields (title, subtitle, type)
✅ FTS5 full-text search
✅ Project-scoped queries
✅ Rich metadata for progressive disclosure

Worker Service Redesign

v3 worker:

// Multiple short SDK sessions
app.post('/process', async (req, res) => {
  const response = await query({
    prompt: buildPrompt(req.body),
    options: { maxTurns: 1 }
  });

  for await (const msg of response) {
    // Process single observation
  }

  res.json({ success: true });
});

v4 worker:

// ONE long-running SDK session
async function runWorker(sessionId: string) {
  const response = query({
    prompt: messageGenerator(),  // AsyncIterable
    options: { maxTurns: 1000 }
  });

  for await (const msg of response) {
    if (msg.type === 'text') {
      parseObservations(msg.content);
      parseSummaries(msg.content);
    }
  }
}

Benefits:

Maintains conversation state
SDK handles context automatically
More efficient (fewer API calls)
Natural multi-turn flow

Critical Fixes Along the Way

Fix 1: Context Injection Pollution (v4.3.1)

Problem: SessionStart hook output polluted with npm install logs

# Hook output contained:
npm WARN deprecated ...
npm WARN deprecated ...
{"hookSpecificOutput": {"additionalContext": "..."}}

Why it broke:

Claude Code expects clean JSON or plain text
stderr/stdout from npm install mixed with hook output
Context didn’t inject properly

Solution:

{
  "command": "npm install --loglevel=silent && node context-hook.js"
}

Result: Clean JSON output, context injection works

Fix 2: Double Shebang Issue (v4.3.1)

Problem: Hook executables had duplicate shebangs

#!/usr/bin/env node
#!/usr/bin/env node  // ← Duplicate!

// Rest of code...

Why it happened:

Source files had shebang
esbuild added another shebang during build

Solution:

// Remove shebangs from source files
// Let esbuild add them during build

Result: Clean executables, no parsing errors

Fix 3: FTS5 Injection Vulnerability (v4.2.3)

Problem: User input passed directly to FTS5 query

// ❌ Vulnerable
const results = db.query(
  `SELECT * FROM observations_fts WHERE observations_fts MATCH '${userQuery}'`
);

Attack:

userQuery = "'; DROP TABLE observations; --"

Solution:

// ✅ Safe: Use parameterized queries
const results = db.query(
  'SELECT * FROM observations_fts WHERE observations_fts MATCH ?',
  [userQuery]
);

Fix 4: NOT NULL Constraint Violation (v4.2.8)

Problem: Session creation failed when prompt was empty

INSERT INTO sdk_sessions (claude_session_id, user_prompt, ...)
VALUES ('abc123', NULL, ...)  -- ❌ user_prompt is NOT NULL

Solution:

// Allow NULL user_prompts
user_prompt: input.prompt ?? null

Schema change:

-- Before
user_prompt TEXT NOT NULL

-- After
user_prompt TEXT  -- Nullable

Performance Improvements

Optimization 1: Prepared Statements

Before:

for (const obs of observations) {
  db.run(`INSERT INTO observations (...) VALUES (?, ?, ...)`, [obs.id, obs.text, ...]);
}

After:

const stmt = db.prepare(`INSERT INTO observations (...) VALUES (?, ?, ...)`);
for (const obs of observations) {
  stmt.run([obs.id, obs.text, ...]);
}
stmt.finalize();

Impact: 5x faster bulk inserts

Optimization 2: FTS5 Indexing

Before:

// Manual full-text search
const results = db.query(
  `SELECT * FROM observations WHERE text LIKE '%${query}%'`
);

After:

// FTS5 virtual table
const results = db.query(
  `SELECT * FROM observations_fts WHERE observations_fts MATCH ?`,
  [query]
);

Impact: 100x faster searches on large datasets

Optimization 3: Index Format Default

Before:

// Always return full observations
search_observations({ query: "hooks" });
// Returns: 5,000 tokens

After:

// Default to index format
search_observations({ query: "hooks", format: "index" });
// Returns: 200 tokens

// Fetch full only when needed
search_observations({ query: "hooks", format: "full", limit: 1 });
// Returns: 150 tokens

Impact: 25x reduction in average search result size

What We Learned

Lesson 1: Context is Precious

Principle: Every token you put in context window costs attention. Application:

Progressive disclosure reduces waste by 87%
Index-first approach gives agent control
Token counts make costs visible

Lesson 2: Session State is Complicated

Principle: Distributed state is hard. SDK handles it better than we can. Application:

Use SDK’s built-in session resumption
Don’t try to manually reconstruct state
Track session IDs from init messages

Lesson 3: Graceful Beats Aggressive

Principle: Let processes finish their work before terminating. Application:

Graceful cleanup prevents data loss
Workers finish important operations
Clean state transitions reduce bugs

Lesson 4: AI is the Compressor

Principle: Don’t compress manually. Let AI do semantic compression. Application:

10:1 to 100:1 compression ratios
Semantic understanding, not keyword extraction
Structured outputs (XML parsing)

Lesson 5: Progressive Everything

Principle: Show metadata first, fetch details on-demand. Application:

Progressive disclosure in context injection
Index format in search results
Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details)

The Road Ahead

Planned: Adaptive Index Size

SessionStart({ source: "startup" }):
  → Show last 10 sessions (normal)

SessionStart({ source: "resume" }):
  → Show only current session (minimal)

SessionStart({ source: "compact" }):
  → Show last 20 sessions (comprehensive)

Planned: Relevance Scoring

// Use embeddings to pre-sort index by semantic relevance
search_observations({
  query: "authentication bug",
  sort: "relevance"  // Based on embeddings
});

Planned: Multi-Project Context

// Cross-project pattern recognition
search_observations({
  query: "API rate limiting",
  projects: ["api-gateway", "user-service", "billing-service"]
});

Planned: Collaborative Memory

// Team-shared observations (optional)
createObservation({
  title: "Rate limit: 100 req/min",
  scope: "team"  // vs "user"
});

Migration Guide: v3 → v5

Step 1: Backup Database

cp ~/.claude-mem/claude-mem.db ~/.claude-mem/claude-mem-v3-backup.db

Step 2: Update Plugin

cd ~/.claude/plugins/marketplaces/thedotmack
git pull

Step 3: Update Plugin

/plugin update claude-mem

What happens automatically:

Dependencies update (including new ones like chromadb for v5.0.0+)
Database schema migrations run automatically
Worker service restarts with new code
Smart install caching activates (v5.0.3+)

Step 4: Test

# Start Claude Code
claude

# Check that context is injected
# (Should see progressive disclosure index with v5 viewer link)

# Open viewer UI (v5.1.0+)
open http://localhost:37777

# Submit a prompt and watch real-time updates in viewer

Step 5: Explore New Features

# View memory stream in browser (v5.1.0+)
open http://localhost:37777

# Toggle theme (v5.1.2+)
# Click theme button in viewer header

# Check worker health
npm run worker:status
curl http://localhost:37777/health

Key Metrics

v3 Performance

Metric	Value
Context usage per session	~25,000 tokens
Relevant context	~2,000 tokens (8%)
Hook execution time	~200ms
Search latency	~500ms (LIKE queries)

v4 Performance

Metric	Value
Context usage per session	~1,100 tokens
Relevant context	~1,100 tokens (100%)
Hook execution time	~45ms
Search latency	~15ms (FTS5)

v5 Performance

Metric	Value
Context usage per session	~1,100 tokens
Relevant context	~1,100 tokens (100%)
Hook execution time	~10ms (cached install)
Search latency	~12ms (FTS5) or ~25ms (hybrid)
Viewer UI load time	~50ms (bundled HTML)
SSE update latency	~5ms (real-time)

v3 → v4 Improvements:

96% reduction in context waste
12x increase in relevance
4x faster hooks
33x faster search

v4 → v5 Improvements:

78% faster hooks (smart caching)
Real-time visualization (viewer UI)
Better search relevance (hybrid)
Enhanced UX (theme toggle, persistence)

Conclusion

The journey from v3 to v5 was about understanding these fundamental truths:

Context is finite - Progressive disclosure respects attention budget
AI is the compressor - Semantic understanding beats keyword extraction
Agents are smart - Let them decide what to fetch
State is hard - Use SDK’s built-in mechanisms
Graceful wins - Let processes finish cleanly

The result is a memory system that’s both powerful and invisible. Users never notice it working - Claude just gets smarter over time. v5 adds visibility: Now users CAN see the memory system working if they want (via viewer UI), but it’s still non-intrusive.

Get Started

Best Practices

Configuration & Development

Architecture

​Architecture Evolution: The Journey from v3 to v5

​The Problem We Solved

​v5.x: Maturity and User Experience

​v5.1.2: Theme Toggle (November 2025)

​v5.1.1: PM2 Windows Fix (November 2025)

​v5.1.0: Web-Based Viewer UI (October 2025)

​v5.0.3: Smart Install Caching (October 2025)

​v5.0.2: Worker Health Checks (October 2025)

​v5.0.1: Stability Improvements (October 2025)

​v5.0.0: Hybrid Search Architecture (October 2025)

​v1-v2: The Naive Approach

​The First Attempt: Dump Everything

​v3: Smart Compression, Wrong Architecture

​The Breakthrough: AI-Powered Compression

​The Key Realizations

​Realization 1: Progressive Disclosure

​Realization 2: Session ID Chaos

​Realization 3: Graceful vs Aggressive Cleanup

​Realization 4: One Session, Not Many

​v4: The Architecture That Works

​The Core Design

​The Five Hook Architecture

​Database Schema Evolution

​Worker Service Redesign

​Critical Fixes Along the Way

​Fix 1: Context Injection Pollution (v4.3.1)

​Fix 2: Double Shebang Issue (v4.3.1)

​Fix 3: FTS5 Injection Vulnerability (v4.2.3)

​Fix 4: NOT NULL Constraint Violation (v4.2.8)

​Performance Improvements

​Optimization 1: Prepared Statements

​Optimization 2: FTS5 Indexing

​Optimization 3: Index Format Default

​What We Learned

​Lesson 1: Context is Precious

​Lesson 2: Session State is Complicated

​Lesson 3: Graceful Beats Aggressive

​Lesson 4: AI is the Compressor

​Lesson 5: Progressive Everything

​The Road Ahead

​Planned: Adaptive Index Size

​Planned: Relevance Scoring

​Planned: Multi-Project Context

​Planned: Collaborative Memory

​Migration Guide: v3 → v5

​Step 1: Backup Database

​Step 2: Update Plugin

​Step 3: Update Plugin

​Step 4: Test

​Step 5: Explore New Features

​Key Metrics

​v3 Performance

​v4 Performance

​v5 Performance

​Conclusion

​Further Reading

Architecture Evolution: The Journey from v3 to v5

The Problem We Solved

v5.x: Maturity and User Experience

v5.1.2: Theme Toggle (November 2025)

v5.1.1: PM2 Windows Fix (November 2025)

v5.1.0: Web-Based Viewer UI (October 2025)

v5.0.3: Smart Install Caching (October 2025)

v5.0.2: Worker Health Checks (October 2025)

v5.0.1: Stability Improvements (October 2025)

v5.0.0: Hybrid Search Architecture (October 2025)

v1-v2: The Naive Approach

The First Attempt: Dump Everything

v3: Smart Compression, Wrong Architecture

The Breakthrough: AI-Powered Compression

The Key Realizations

Realization 1: Progressive Disclosure

Realization 2: Session ID Chaos

Realization 3: Graceful vs Aggressive Cleanup

Realization 4: One Session, Not Many

v4: The Architecture That Works

The Core Design

The Five Hook Architecture

Database Schema Evolution

Worker Service Redesign

Critical Fixes Along the Way

Fix 1: Context Injection Pollution (v4.3.1)

Fix 2: Double Shebang Issue (v4.3.1)

Fix 3: FTS5 Injection Vulnerability (v4.2.3)

Fix 4: NOT NULL Constraint Violation (v4.2.8)

Performance Improvements

Optimization 1: Prepared Statements

Optimization 2: FTS5 Indexing

Optimization 3: Index Format Default

What We Learned

Lesson 1: Context is Precious

Lesson 2: Session State is Complicated

Lesson 3: Graceful Beats Aggressive

Lesson 4: AI is the Compressor

Lesson 5: Progressive Everything

The Road Ahead

Planned: Adaptive Index Size

Planned: Relevance Scoring

Planned: Multi-Project Context

Planned: Collaborative Memory

Migration Guide: v3 → v5

Step 1: Backup Database

Step 2: Update Plugin

Step 3: Update Plugin

Step 4: Test

Step 5: Explore New Features

Key Metrics

v3 Performance

v4 Performance

v5 Performance

Conclusion

Further Reading