Manual Recovery Guide
Overview
Claude-mem’s manual recovery system helps you recover observations that get stuck in the processing queue after worker crashes, system restarts, or unexpected shutdowns. Key Change in v5.x: Automatic recovery on worker startup is now disabled. This gives you explicit control over when reprocessing happens, preventing unexpected duplicate observations.When Do You Need Manual Recovery?
You should trigger manual recovery when:- Worker crashed or restarted - Observations were queued but worker stopped before processing
- No new summaries appearing - Observations are being saved but not processed into summaries
- Stuck messages detected - Messages showing as “processing” for >5 minutes
- System crashes - Unexpected shutdowns left messages in incomplete states
Quick Start
Using the CLI Tool (Recommended)
The interactive CLI tool is the safest and easiest way to recover stuck observations:- Check worker health
- Show queue summary (pending, processing, failed, stuck counts)
- Display sessions with pending work
- Prompt you to confirm recovery
- Show recently processed messages for feedback
Auto-Process Without Prompts
For scripting or when you’re confident recovery is needed:Understanding Queue States
Messages progress through these lifecycle states:- pending → Queued, waiting to process
- processing → Currently being processed by SDK agent
- processed → Completed successfully
- failed → Failed after 3 retry attempts
Stuck Detection
Messages inprocessing state for >5 minutes are considered stuck:
- They’re automatically reset to
pendingon worker startup - They’re NOT automatically reprocessed (requires manual trigger)
- They appear in the
stuckCountfield when checking queue status
Recovery Methods
Method 1: Interactive CLI Tool
Best for: Regular users, interactive sessions, when you want visibility into what’s happening- ✅ Pre-flight health check (verifies worker is running)
- ✅ Detailed queue breakdown by session
- ✅ Age tracking for stuck detection
- ✅ Confirmation prompt (prevents accidental reprocessing)
- ✅ Non-interactive mode with
--processflag - ✅ Session limit control with
--limit N
Method 2: HTTP API
Best for: Automation, scripting, integration with monitoring systemsCheck Queue Status
totalPending- Messages waiting to processtotalProcessing- Messages currently processingstuckCount- Processing messages >5 minutes oldsessionsWithPendingWork- Session IDs needing recovery
Trigger Recovery
totalPendingSessions- Total sessions with pending messages in databasesessionsStarted- Sessions we started processing this requestsessionsSkipped- Sessions already processing (prevents duplicate agents)startedSessionIds- Database IDs of sessions we started
Best Practices
1. Always Check Before Recovery
2. Start with Low Session Limits
3. Monitor During Recovery
Watch worker logs while recovery runs:- SDK agent starts:
Starting SDK agent for session... - Processing completions:
Processed observation... - Errors:
ERRORorFailed to process...
4. Verify Recovery Success
Check recently processed messages:5. Handle Failed Messages
Messages that fail 3 times are markedfailed and won’t auto-retry:
Troubleshooting
Recovery Not Working
Symptom: Triggered recovery but messages still pending Solutions:-
Verify worker health:
-
Check worker logs for errors:
-
Restart worker:
-
Check database integrity:
Messages Stuck Forever
Symptom: Messages show as “processing” for hours Solution: Force reset stuck messagesWorker Crashes During Recovery
Symptom: Worker stops while processing recovered messages Solutions:-
Check available memory:
-
Reduce session limit:
-
Check for SDK errors in logs:
-
Increase worker memory (if using custom runner):
Advanced Usage
Direct Database Inspection
View all pending messages:Count Messages by Status
Find Sessions with Pending Work
View Recent Failures
Integration Examples
Cron Job for Automatic Recovery
Monitoring Script
Pre-Shutdown Recovery
Migration Note
If you’re upgrading from v4.x to v5.x: v4.x Behavior (Automatic Recovery):- Worker automatically recovered stuck messages on startup
- No user control over reprocessing timing
- Stuck messages detected but NOT automatically reprocessed
- User must explicitly trigger recovery via CLI or API
- Prevents unexpected duplicate observations
- Provides explicit control over when processing happens
- Upgrade to v5.x
- Check for stuck messages:
bun scripts/check-pending-queue.ts - Process if needed:
bun scripts/check-pending-queue.ts --process - Add recovery to your workflow (cron job, pre-shutdown script, etc.)
See Also
- Worker Service Architecture - Technical details on queue processing
- Troubleshooting - Manual Recovery - Common issues and solutions
- Database Schema - Pending messages table structure

