Skip to main content

Search Architecture

Claude-mem uses an MCP-based search architecture that provides intelligent memory retrieval through 4 streamlined tools following a 3-layer workflow pattern.

Overview

Architecture: MCP Tools → MCP Protocol → HTTP API → Worker Service Key Components:
  1. MCP Tools (4 tools) - search, timeline, get_observations, __IMPORTANT
  2. MCP Server (plugin/scripts/mcp-server.cjs) - Thin wrapper over HTTP API
  3. HTTP API Endpoints - Fast search operations on Worker Service (port 37777)
  4. Worker Service - Express.js server with FTS5 full-text search
  5. SQLite Database - Persistent storage with FTS5 virtual tables
  6. Chroma Vector DB - Semantic search with hybrid retrieval
Token Efficiency: ~10x savings through 3-layer workflow pattern

How It Works

1. User Query

Claude has access to 4 MCP tools. When searching memory, Claude follows the 3-layer workflow:
Step 1: search(query="authentication bug", type="bugfix", limit=10)
Step 2: timeline(anchor=<observation_id>, depth_before=3, depth_after=3)
Step 3: get_observations(ids=[123, 456, 789])

2. MCP Protocol

MCP server receives tool call via JSON-RPC over stdio:
{
  "method": "tools/call",
  "params": {
    "name": "search",
    "arguments": {
      "query": "authentication bug",
      "type": "bugfix",
      "limit": 10
    }
  }
}

3. HTTP API Call

MCP server translates to HTTP request:
const url = `http://localhost:37777/api/search?query=authentication%20bug&type=bugfix&limit=10`;
const response = await fetch(url);

4. Worker Processing

Worker service executes FTS5 query:
SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = 'bugfix'
ORDER BY rank
LIMIT 10

5. Results Returned

Worker returns structured data → MCP server → Claude:
{
  "content": [{
    "type": "text",
    "text": "| ID | Time | Title | Type |\n|---|---|---|---|\n| #123 | 2:15 PM | Fixed auth token expiry | bugfix |"
  }]
}

6. Claude Processes Results

Claude reviews the index, decides which observations are relevant, and can:
  • Use timeline to get context
  • Use get_observations to fetch full details for selected IDs

The 4 MCP Tools

__IMPORTANT - Workflow Documentation

Always visible to Claude. Explains the 3-layer workflow pattern. Description:
3-LAYER WORKFLOW (ALWAYS FOLLOW):
1. search(query) → Get index with IDs (~50-100 tokens/result)
2. timeline(anchor=ID) → Get context around interesting results
3. get_observations([IDs]) → Fetch full details ONLY for filtered IDs
NEVER fetch full details without filtering first. 10x token savings.
Purpose: Ensures Claude follows token-efficient pattern

search - Search Memory Index

Tool Definition:
{
  name: 'search',
  description: 'Step 1: Search memory. Returns index with IDs. Params: query, limit, project, type, obs_type, dateStart, dateEnd, offset, orderBy',
  inputSchema: {
    type: 'object',
    properties: {},
    additionalProperties: true  // Accepts any parameters
  }
}
HTTP Endpoint: GET /api/search Parameters:
  • query - Full-text search query
  • limit - Maximum results (default: 20)
  • type - Filter by observation type
  • project - Filter by project name
  • dateStart, dateEnd - Date range filters
  • offset - Pagination offset
  • orderBy - Sort order
Returns: Compact index with IDs, titles, dates, types (~50-100 tokens per result)

timeline - Get Chronological Context

Tool Definition:
{
  name: 'timeline',
  description: 'Step 2: Get context around results. Params: anchor (observation ID) OR query (finds anchor automatically), depth_before, depth_after, project',
  inputSchema: {
    type: 'object',
    properties: {},
    additionalProperties: true
  }
}
HTTP Endpoint: GET /api/timeline Parameters:
  • anchor - Observation ID to center timeline around (optional if query provided)
  • query - Search query to find anchor automatically (optional if anchor provided)
  • depth_before - Number of observations before anchor (default: 3)
  • depth_after - Number of observations after anchor (default: 3)
  • project - Filter by project name
Returns: Chronological view showing what happened before/during/after

get_observations - Fetch Full Details

Tool Definition:
{
  name: 'get_observations',
  description: 'Step 3: Fetch full details for filtered IDs. Params: ids (array of observation IDs, required), orderBy, limit, project',
  inputSchema: {
    type: 'object',
    properties: {
      ids: {
        type: 'array',
        items: { type: 'number' },
        description: 'Array of observation IDs to fetch (required)'
      }
    },
    required: ['ids'],
    additionalProperties: true
  }
}
HTTP Endpoint: POST /api/observations/batch Body:
{
  "ids": [123, 456, 789],
  "orderBy": "date_desc",
  "project": "my-app"
}
Returns: Complete observation details (~500-1,000 tokens per observation)

MCP Server Implementation

Location: /Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs Role: Thin wrapper that translates MCP protocol to HTTP API calls Key Characteristics:
  • ~312 lines of code (reduced from ~2,718 lines in old implementation)
  • No business logic - just protocol translation
  • Single source of truth: Worker HTTP API
  • Simple schemas with additionalProperties: true
Handler Example:
{
  name: 'search',
  handler: async (args: any) => {
    const endpoint = '/api/search';
    const searchParams = new URLSearchParams();

    for (const [key, value] of Object.entries(args)) {
      searchParams.append(key, String(value));
    }

    const url = `http://localhost:37777${endpoint}?${searchParams}`;
    const response = await fetch(url);
    return await response.json();
  }
}

Worker HTTP API

Location: src/services/worker-service.ts Port: 37777 Search Endpoints:
GET  /api/search           # Main search (used by MCP search tool)
GET  /api/timeline         # Timeline context (used by MCP timeline tool)
POST /api/observations/batch  # Fetch by IDs (used by MCP get_observations tool)
GET  /api/health           # Health check
Database Access:
  • Uses SessionSearch service for FTS5 queries
  • Uses SessionStore for structured queries
  • Hybrid search with ChromaDB for semantic similarity
FTS5 Full-Text Search:
// search tool → HTTP GET → FTS5 query
SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = ?
AND date >= ? AND date <= ?
ORDER BY rank
LIMIT ? OFFSET ?

The 3-Layer Workflow Pattern

Design Philosophy

The 3-layer workflow embodies progressive disclosure - a core principle of claude-mem’s architecture. Layer 1: Index (Search)
  • What: Compact table with IDs, titles, dates, types
  • Cost: ~50-100 tokens per result
  • Purpose: Survey what exists before committing tokens
  • Decision Point: “Which observations are relevant?”
Layer 2: Context (Timeline)
  • What: Chronological view of observations around a point
  • Cost: Variable based on depth
  • Purpose: Understand narrative arc, see what led to/from a point
  • Decision Point: “Do I need full details?”
Layer 3: Details (Get Observations)
  • What: Complete observation data (narrative, facts, files, concepts)
  • Cost: ~500-1,000 tokens per observation
  • Purpose: Deep dive on validated, relevant observations
  • Decision Point: “Apply knowledge to current task”

Token Efficiency

Traditional RAG Approach:
Fetch 20 observations upfront: 10,000-20,000 tokens
Relevance: ~10% (only 2 observations actually useful)
Waste: 18,000 tokens on irrelevant context
3-Layer Workflow:
Step 1: search (20 results)        ~1,000-2,000 tokens
Step 2: Review index, filter to 3 relevant IDs
Step 3: get_observations (3 IDs)   ~1,500-3,000 tokens
Total: 2,500-5,000 tokens (50-75% savings)
10x Savings: By filtering at index level before fetching full details

Architecture Evolution

Before: Complex MCP Implementation

Approach: 9 MCP tools with detailed parameter schemas Token Cost: ~2,500 tokens in tool definitions per session
  • search_observations - Full-text search
  • find_by_type - Filter by type
  • find_by_file - Filter by file
  • find_by_concept - Filter by concept
  • get_recent_context - Recent sessions
  • get_observation - Fetch single observation
  • get_session - Fetch session
  • get_prompt - Fetch prompt
  • help - API documentation
Problems:
  • Overlapping operations (search_observations vs find_by_type)
  • Complex parameter schemas
  • No built-in workflow guidance
  • High token cost at session start
Code Size: ~2,718 lines in mcp-server.ts

After: Streamlined MCP Implementation

Approach: 4 MCP tools following 3-layer workflow Token Cost: ~312 lines of code, simplified tool definitions Tools:
  1. __IMPORTANT - Workflow guidance (always visible)
  2. search - Step 1 (index)
  3. timeline - Step 2 (context)
  4. get_observations - Step 3 (details)
Benefits:
  • Progressive disclosure built into tool design
  • No overlapping operations
  • Simple schemas (additionalProperties: true)
  • Clear workflow pattern
  • ~10x token savings
Code Size: ~312 lines in mcp-server.ts (88% reduction)

Key Insight

Before: Progressive disclosure was something Claude had to remember After: Progressive disclosure is enforced by tool design itself The 3-layer workflow pattern makes it structurally difficult to waste tokens:
  • Can’t fetch details without first getting IDs from search
  • Can’t search without seeing workflow reminder (__IMPORTANT)
  • Timeline provides middle ground between index and full details

Configuration

Claude Desktop

Add to claude_desktop_config.json:
{
  "mcpServers": {
    "mcp-search": {
      "command": "node",
      "args": [
        "/Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs"
      ]
    }
  }
}

Claude Code

MCP server is automatically configured via plugin installation. No manual setup required. Both clients use the same MCP tools - the architecture works identically for Claude Desktop and Claude Code.

Security

FTS5 Injection Prevention

All search queries are escaped before FTS5 processing:
function escapeFTS5Query(query: string): string {
  return query.replace(/"/g, '""');
}
Testing: 332 injection attack tests covering special characters, SQL keywords, quote escaping, and boolean operators.

MCP Protocol Security

  • Stdio transport (no network exposure)
  • Local-only HTTP API (localhost:37777)
  • No authentication needed (local development only)

Performance

FTS5 Full-Text Search: Sub-10ms for typical queries MCP Overhead: Minimal - simple protocol translation Caching: HTTP layer allows response caching (future enhancement) Pagination: Efficient with offset/limit Batching: get_observations accepts multiple IDs in single call

Benefits Over Alternative Approaches

vs. Traditional RAG

Traditional RAG:
  • Fetches everything upfront
  • High token cost
  • Low relevance ratio
3-Layer MCP:
  • Fetches only what’s needed
  • ~10x token savings
  • 100% relevance (Claude chooses what to fetch)

vs. Previous MCP Implementation (v5.x)

Previous (9 tools):
  • Complex schemas
  • Overlapping operations
  • No workflow guidance
  • ~2,500 tokens in definitions
Current (4 tools):
  • Simple schemas
  • Clear workflow
  • Built-in guidance
  • ~312 lines of code

vs. Skill-Based Approach (Previously)

Skill approach:
  • Required separate skill files
  • HTTP API called directly via curl
  • Progressive disclosure through skill loading
MCP approach:
  • Native MCP protocol (better Claude integration)
  • Cleaner architecture (protocol translation layer)
  • Works with both Claude Desktop and Claude Code
  • Simpler to maintain (no skill files)
Migration: Skill-based search was removed in favor of streamlined MCP architecture.

Troubleshooting

MCP Server Not Connected

Symptoms: Tools not appearing in Claude Solution:
  1. Check MCP server path in configuration
  2. Verify worker service is running: curl http://localhost:37777/api/health
  3. Restart Claude Desktop/Code

Worker Service Not Running

Symptoms: MCP tools fail with connection errors Solution:
npm run worker:status       # Check status
npm run worker:restart      # Restart worker
npm run worker:logs         # View logs

Empty Search Results

Symptoms: search() returns no results Troubleshooting:
  1. Test API directly: curl "http://localhost:37777/api/search?query=test"
  2. Check database: ls ~/.claude-mem/claude-mem.db
  3. Verify observations exist: curl "http://localhost:37777/api/health"

Next Steps