Search Architecture

Claude-mem uses an MCP-based search architecture that provides intelligent memory retrieval through 4 streamlined tools following a 3-layer workflow pattern.

Overview

Architecture: MCP Tools → MCP Protocol → HTTP API → Worker Service Key Components:

MCP Tools (4 tools) - search, timeline, get_observations, __IMPORTANT
MCP Server (plugin/scripts/mcp-server.cjs) - Thin wrapper over HTTP API
HTTP API Endpoints - Fast search operations on Worker Service (port 37777)
Worker Service - Express.js server with FTS5 full-text search
SQLite Database - Persistent storage with FTS5 virtual tables
Chroma Vector DB - Semantic search with hybrid retrieval

Token Efficiency: ~10x savings through 3-layer workflow pattern

How It Works

1. User Query

Claude has access to 4 MCP tools. When searching memory, Claude follows the 3-layer workflow:

Step 1: search(query="authentication bug", type="bugfix", limit=10)
Step 2: timeline(anchor=<observation_id>, depth_before=3, depth_after=3)
Step 3: get_observations(ids=[123, 456, 789])

2. MCP Protocol

MCP server receives tool call via JSON-RPC over stdio:

{
  "method": "tools/call",
  "params": {
    "name": "search",
    "arguments": {
      "query": "authentication bug",
      "type": "bugfix",
      "limit": 10
    }
  }
}

3. HTTP API Call

MCP server translates to HTTP request:

const url = `http://localhost:37777/api/search?query=authentication%20bug&type=bugfix&limit=10`;
const response = await fetch(url);

4. Worker Processing

Worker service executes FTS5 query:

SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = 'bugfix'
ORDER BY rank
LIMIT 10

5. Results Returned

Worker returns structured data → MCP server → Claude:

{
  "content": [{
    "type": "text",
    "text": "| ID | Time | Title | Type |\n|---|---|---|---|\n| #123 | 2:15 PM | Fixed auth token expiry | bugfix |"
  }]
}

6. Claude Processes Results

Claude reviews the index, decides which observations are relevant, and can:

Use timeline to get context
Use get_observations to fetch full details for selected IDs

The 4 MCP Tools

`__IMPORTANT` - Workflow Documentation

Always visible to Claude. Explains the 3-layer workflow pattern. Description:

3-LAYER WORKFLOW (ALWAYS FOLLOW):
1. search(query) → Get index with IDs (~50-100 tokens/result)
2. timeline(anchor=ID) → Get context around interesting results
3. get_observations([IDs]) → Fetch full details ONLY for filtered IDs
NEVER fetch full details without filtering first. 10x token savings.

Purpose: Ensures Claude follows token-efficient pattern

`search` - Search Memory Index

Tool Definition:

{
  name: 'search',
  description: 'Step 1: Search memory. Returns index with IDs. Params: query, limit, project, type, obs_type, dateStart, dateEnd, offset, orderBy',
  inputSchema: {
    type: 'object',
    properties: {},
    additionalProperties: true  // Accepts any parameters
  }
}

HTTP Endpoint: GET /api/search Parameters:

query - Full-text search query
limit - Maximum results (default: 20)
type - Filter by observation type
project - Filter by project name
dateStart, dateEnd - Date range filters
offset - Pagination offset
orderBy - Sort order

Returns: Compact index with IDs, titles, dates, types (~50-100 tokens per result)

`timeline` - Get Chronological Context

Tool Definition:

{
  name: 'timeline',
  description: 'Step 2: Get context around results. Params: anchor (observation ID) OR query (finds anchor automatically), depth_before, depth_after, project',
  inputSchema: {
    type: 'object',
    properties: {},
    additionalProperties: true
  }
}

HTTP Endpoint: GET /api/timeline Parameters:

anchor - Observation ID to center timeline around (optional if query provided)
query - Search query to find anchor automatically (optional if anchor provided)
depth_before - Number of observations before anchor (default: 3)
depth_after - Number of observations after anchor (default: 3)
project - Filter by project name

Returns: Chronological view showing what happened before/during/after

`get_observations` - Fetch Full Details

Tool Definition:

{
  name: 'get_observations',
  description: 'Step 3: Fetch full details for filtered IDs. Params: ids (array of observation IDs, required), orderBy, limit, project',
  inputSchema: {
    type: 'object',
    properties: {
      ids: {
        type: 'array',
        items: { type: 'number' },
        description: 'Array of observation IDs to fetch (required)'
      }
    },
    required: ['ids'],
    additionalProperties: true
  }
}

HTTP Endpoint: POST /api/observations/batch Body:

{
  "ids": [123, 456, 789],
  "orderBy": "date_desc",
  "project": "my-app"
}

Returns: Complete observation details (~500-1,000 tokens per observation)

MCP Server Implementation

Location: /Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs Role: Thin wrapper that translates MCP protocol to HTTP API calls Key Characteristics:

~312 lines of code (reduced from ~2,718 lines in old implementation)
No business logic - just protocol translation
Single source of truth: Worker HTTP API
Simple schemas with additionalProperties: true

Handler Example:

{
  name: 'search',
  handler: async (args: any) => {
    const endpoint = '/api/search';
    const searchParams = new URLSearchParams();

    for (const [key, value] of Object.entries(args)) {
      searchParams.append(key, String(value));
    }

    const url = `http://localhost:37777${endpoint}?${searchParams}`;
    const response = await fetch(url);
    return await response.json();
  }
}

Worker HTTP API

Location: src/services/worker-service.ts Port: 37777 Search Endpoints:

GET  /api/search           # Main search (used by MCP search tool)
GET  /api/timeline         # Timeline context (used by MCP timeline tool)
POST /api/observations/batch  # Fetch by IDs (used by MCP get_observations tool)
GET  /api/health           # Health check

Database Access:

Uses SessionSearch service for FTS5 queries
Uses SessionStore for structured queries
Hybrid search with ChromaDB for semantic similarity

FTS5 Full-Text Search:

// search tool → HTTP GET → FTS5 query
SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = ?
AND date >= ? AND date <= ?
ORDER BY rank
LIMIT ? OFFSET ?

The 3-Layer Workflow Pattern

Design Philosophy

The 3-layer workflow embodies progressive disclosure - a core principle of claude-mem’s architecture. Layer 1: Index (Search)

What: Compact table with IDs, titles, dates, types
Cost: ~50-100 tokens per result
Purpose: Survey what exists before committing tokens
Decision Point: “Which observations are relevant?”

Layer 2: Context (Timeline)

What: Chronological view of observations around a point
Cost: Variable based on depth
Purpose: Understand narrative arc, see what led to/from a point
Decision Point: “Do I need full details?”

Layer 3: Details (Get Observations)

What: Complete observation data (narrative, facts, files, concepts)
Cost: ~500-1,000 tokens per observation
Purpose: Deep dive on validated, relevant observations
Decision Point: “Apply knowledge to current task”

Token Efficiency

Traditional RAG Approach:

Fetch 20 observations upfront: 10,000-20,000 tokens
Relevance: ~10% (only 2 observations actually useful)
Waste: 18,000 tokens on irrelevant context

3-Layer Workflow:

Step 1: search (20 results)        ~1,000-2,000 tokens
Step 2: Review index, filter to 3 relevant IDs
Step 3: get_observations (3 IDs)   ~1,500-3,000 tokens
Total: 2,500-5,000 tokens (50-75% savings)

10x Savings: By filtering at index level before fetching full details

Architecture Evolution

Before: Complex MCP Implementation

Approach: 9 MCP tools with detailed parameter schemas Token Cost: ~2,500 tokens in tool definitions per session

search_observations - Full-text search
find_by_type - Filter by type
find_by_file - Filter by file
find_by_concept - Filter by concept
get_recent_context - Recent sessions
get_observation - Fetch single observation
get_session - Fetch session
get_prompt - Fetch prompt
help - API documentation

Problems:

Overlapping operations (search_observations vs find_by_type)
Complex parameter schemas
No built-in workflow guidance
High token cost at session start

Code Size: ~2,718 lines in mcp-server.ts

After: Streamlined MCP Implementation

Approach: 4 MCP tools following 3-layer workflow Token Cost: ~312 lines of code, simplified tool definitions Tools:

__IMPORTANT - Workflow guidance (always visible)
search - Step 1 (index)
timeline - Step 2 (context)
get_observations - Step 3 (details)

Benefits:

Progressive disclosure built into tool design
No overlapping operations
Simple schemas (additionalProperties: true)
Clear workflow pattern
~10x token savings

Code Size: ~312 lines in mcp-server.ts (88% reduction)

Key Insight

Before: Progressive disclosure was something Claude had to remember After: Progressive disclosure is enforced by tool design itself The 3-layer workflow pattern makes it structurally difficult to waste tokens:

Can’t fetch details without first getting IDs from search
Can’t search without seeing workflow reminder (__IMPORTANT)
Timeline provides middle ground between index and full details

Configuration

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "mcp-search": {
      "command": "node",
      "args": [
        "/Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs"
      ]
    }
  }
}

Claude Code

MCP server is automatically configured via plugin installation. No manual setup required. Both clients use the same MCP tools - the architecture works identically for Claude Desktop and Claude Code.

Security

FTS5 Injection Prevention

All search queries are escaped before FTS5 processing:

function escapeFTS5Query(query: string): string {
  return query.replace(/"/g, '""');
}

Testing: 332 injection attack tests covering special characters, SQL keywords, quote escaping, and boolean operators.

MCP Protocol Security

Stdio transport (no network exposure)
Local-only HTTP API (localhost:37777)
No authentication needed (local development only)

Performance

FTS5 Full-Text Search: Sub-10ms for typical queries MCP Overhead: Minimal - simple protocol translation Caching: HTTP layer allows response caching (future enhancement) Pagination: Efficient with offset/limit Batching: get_observations accepts multiple IDs in single call

Benefits Over Alternative Approaches

vs. Traditional RAG

Traditional RAG:

Fetches everything upfront
High token cost
Low relevance ratio

3-Layer MCP:

Fetches only what’s needed
~10x token savings
100% relevance (Claude chooses what to fetch)

vs. Previous MCP Implementation (v5.x)

Previous (9 tools):

Complex schemas
Overlapping operations
No workflow guidance
~2,500 tokens in definitions

Current (4 tools):

Simple schemas
Clear workflow
Built-in guidance
~312 lines of code

vs. Skill-Based Approach (Previously)

Skill approach:

Required separate skill files
HTTP API called directly via curl
Progressive disclosure through skill loading

MCP approach:

Native MCP protocol (better Claude integration)
Cleaner architecture (protocol translation layer)
Works with both Claude Desktop and Claude Code
Simpler to maintain (no skill files)

Migration: Skill-based search was removed in favor of streamlined MCP architecture.

Troubleshooting

MCP Server Not Connected

Symptoms: Tools not appearing in Claude Solution:

Check MCP server path in configuration
Verify worker service is running: curl http://localhost:37777/api/health
Restart Claude Desktop/Code

Worker Service Not Running

Symptoms: MCP tools fail with connection errors Solution:

npm run worker:status       # Check status
npm run worker:restart      # Restart worker
npm run worker:logs         # View logs

Empty Search Results

Symptoms: search() returns no results Troubleshooting:

Test API directly: curl "http://localhost:37777/api/search?query=test"
Check database: ls ~/.claude-mem/claude-mem.db
Verify observations exist: curl "http://localhost:37777/api/health"

Next Steps

Memory Search Usage - User guide with examples
Progressive Disclosure - Philosophy behind 3-layer workflow
Worker Service Architecture - HTTP API details
Database Schema - FTS5 tables and indexes

Get Started

Cursor Integration

Best Practices

Configuration & Development

Architecture

​Search Architecture

​Overview

​How It Works

​1. User Query

​2. MCP Protocol

​3. HTTP API Call

​4. Worker Processing

​5. Results Returned

​6. Claude Processes Results

​The 4 MCP Tools

​__IMPORTANT - Workflow Documentation

​search - Search Memory Index

​timeline - Get Chronological Context

​get_observations - Fetch Full Details

​MCP Server Implementation

​Worker HTTP API

​The 3-Layer Workflow Pattern

​Design Philosophy

​Token Efficiency

​Architecture Evolution

​Before: Complex MCP Implementation

​After: Streamlined MCP Implementation

​Key Insight

​Configuration

​Claude Desktop

​Claude Code

​Security

​FTS5 Injection Prevention

​MCP Protocol Security

​Performance

​Benefits Over Alternative Approaches

​vs. Traditional RAG

​vs. Previous MCP Implementation (v5.x)

​vs. Skill-Based Approach (Previously)

​Troubleshooting

​MCP Server Not Connected

​Worker Service Not Running

​Empty Search Results

​Next Steps

Search Architecture

Overview

How It Works

1. User Query

2. MCP Protocol

3. HTTP API Call

4. Worker Processing

5. Results Returned

6. Claude Processes Results

The 4 MCP Tools

`__IMPORTANT` - Workflow Documentation

`search` - Search Memory Index

`timeline` - Get Chronological Context

`get_observations` - Fetch Full Details

MCP Server Implementation

Worker HTTP API

The 3-Layer Workflow Pattern

Design Philosophy

Token Efficiency

Architecture Evolution

Before: Complex MCP Implementation

After: Streamlined MCP Implementation

Key Insight

Configuration

Claude Desktop

Claude Code

Security

FTS5 Injection Prevention

MCP Protocol Security

Performance

Benefits Over Alternative Approaches

vs. Traditional RAG

vs. Previous MCP Implementation (v5.x)

vs. Skill-Based Approach (Previously)

Troubleshooting

MCP Server Not Connected

Worker Service Not Running

Empty Search Results

Next Steps