Chroma Search Completion Plan
Current State Analysis
What’s Working ✅
-
Hybrid Search Implementation
- Chroma semantic search + SQLite temporal filtering is working
- Evidence: Queries like “AI embeddings” find “hybrid search” through semantic similarity
- All metadata-first tools use Chroma ranking
-
Tools Using Chroma Correctly
search_observations- Semantic-first workflow (Chroma top 100 → 90-day filter → SQLite hydrate)find_by_concept- Metadata-first + Chroma semantic rankingfind_by_file- Metadata-first + Chroma semantic rankingfind_by_type- Metadata-first + Chroma semantic ranking
-
Data Synced to Chroma
- ✅ Observations (all fields: narrative, facts, text as separate docs)
- ✅ Session summaries (all fields: request, investigated, learned, completed, next_steps, notes as separate docs)
- ❌ User prompts (NOT synced yet)
What’s Missing ❌
- search_sessions tool - Only uses SQLite FTS5, not leveraging Chroma semantic search
- search_user_prompts tool - Only uses SQLite FTS5, not leveraging Chroma semantic search
- User prompts not synced to Chroma - Need to add to sync experiment and worker process
Why User Prompts Need Semantic Search
Benefits:- Users often search for “what I asked about X” but phrase it differently than original prompt
- Semantic search finds related requests even with different wording
- Example: Search “authentication setup” finds prompts about “login system”, “user auth”, “sign-in flow”
- Completes the triad: What was done (observations) + What was learned (summaries) + What was requested (prompts)
- Each user prompt becomes ONE document in Chroma (unlike observations/summaries which split by field)
- Metadata:
sqlite_id,doc_type: 'user_prompt',sdk_session_id,project,created_at_epoch,prompt_number - Document ID format:
prompt_{id}(simpler than observations since no field splitting)
Implementation Plan
Phase 1: Sync User Prompts to Chroma
Files to modify:experiment/chroma-sync-experiment.ts- Add user_prompts sync section- Future: Worker service incremental sync (not in this phase)
Phase 2: Update search_sessions to Use Chroma
File:src/servers/search-server.ts (lines ~441-481)
Current implementation:
queryChroma function to extract summary IDs from document IDs:
SessionStore.ts:
Phase 3: Update search_user_prompts to Use Chroma
File:src/servers/search-server.ts (lines ~956-1010)
Current implementation:
prompt_{id} format:
SessionStore.ts:
Phase 4: Timeline Context Tool
New tool:get_context_timeline
Purpose: Show observations/sessions/prompts around a specific point in time
API:
- Resolve anchor to a timestamp (observation.created_at_epoch, session.created_at_epoch, or parse ISO)
- Query observations within [anchor_time - depth_before_duration, anchor_time + depth_after_duration]
- Return chronologically ordered results with anchor highlighted
- Support mixing observations, sessions, and prompts in single timeline
Testing Plan
Phase 1 Testing
Phase 2 Testing
Phase 3 Testing
Phase 4 Testing
Files to Modify
- experiment/chroma-sync-experiment.ts - Add user_prompts sync
- src/servers/search-server.ts - Update search_sessions and search_user_prompts, add get_context_timeline
- src/services/sqlite/SessionStore.ts - Add getSessionSummariesByIds, getUserPromptsByIds, getTimelineAroundTimestamp
- src/services/sqlite/types.ts - Ensure all return types are exported
Success Criteria
- ✅ All 8 search tools use Chroma semantic search with SQLite temporal fallback
- ✅ User prompts are synced to Chroma and searchable
- ✅ Timeline tool provides chronological context around any point
- ✅ Semantic search works across observations, sessions, and prompts
- ✅ All searches maintain 90-day temporal filtering for relevance
Future Enhancements
- Incremental sync in worker service - Currently only batch sync via experiment
- Configurable temporal windows - Make 90-day filter configurable
- Cross-collection search - Search across observations + sessions + prompts in one query
- Timeline view improvements - Group by session, highlight anchor, show relationships

