OpenRouter Provider
Claude-mem supports OpenRouter as an alternative provider for observation extraction. OpenRouter provides a unified API to access 100+ models from different providers including Google, Meta, Mistral, DeepSeek, and many others—often with generous free tiers.
Free Models Available: OpenRouter offers several completely free models, making it an excellent choice for reducing observation extraction costs to zero while maintaining quality.
Why Use OpenRouter?
- Access to 100+ models: Choose from models across multiple providers through one API
- Free tier options: Several high-quality models are completely free to use
- Cost flexibility: Pay-as-you-go pricing on premium models with no commitments
- Seamless fallback: Automatically falls back to Claude if OpenRouter is unavailable
- Hot-swappable: Switch providers without restarting the worker
- Multi-turn conversations: Full conversation history maintained across API calls
Free Models on OpenRouter
OpenRouter actively supports democratizing AI access by offering free models. These are production-ready models suitable for observation extraction.
Featured Free Models
| Model | ID | Parameters | Context | Best For |
|---|
| Xiaomi MiMo-V2-Flash | xiaomi/mimo-v2-flash:free | 309B (15B active, MoE) | 256K | Reasoning, coding, agents |
| Gemini 2.0 Flash | google/gemini-2.0-flash-exp:free | — | 1M | General purpose |
| Gemini 2.5 Flash | google/gemini-2.5-flash-preview:free | — | 1M | Latest capabilities |
| DeepSeek R1 | deepseek/deepseek-r1:free | 671B | 64K | Reasoning, analysis |
| Llama 3.1 70B | meta-llama/llama-3.1-70b-instruct:free | 70B | 128K | General purpose |
| Llama 3.1 8B | meta-llama/llama-3.1-8b-instruct:free | 8B | 128K | Fast, lightweight |
| Mistral Nemo | mistralai/mistral-nemo:free | 12B | 128K | Efficient performance |
Default Model: Claude-mem uses xiaomi/mimo-v2-flash:free by default—a 309B parameter mixture-of-experts model that ranks #1 on SWE-bench Verified and excels at coding and reasoning tasks.
Free Model Considerations
- Rate limits: Free models may have stricter rate limits than paid models
- Availability: Free capacity depends on provider partnerships and demand
- Queue times: During peak usage, requests may be queued briefly
- Max tokens: Most free models support 65,536 completion tokens
All free models support:
- Tool use and function calling
- Temperature and sampling controls
- Stop sequences
- Streaming responses
Getting an API Key
- Go to OpenRouter
- Sign in with Google, GitHub, or email
- Navigate to API Keys
- Click Create Key
- Copy and securely store your API key
Free to start: No credit card required to create an account or use free models. Add credits only if you want to use premium models.
Configuration
Settings
| Setting | Values | Default | Description |
|---|
CLAUDE_MEM_PROVIDER | claude, gemini, openrouter | claude | AI provider for observation extraction |
CLAUDE_MEM_OPENROUTER_API_KEY | string | — | Your OpenRouter API key |
CLAUDE_MEM_OPENROUTER_MODEL | string | xiaomi/mimo-v2-flash:free | Model identifier (see list above) |
CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES | number | 20 | Max messages in conversation history |
CLAUDE_MEM_OPENROUTER_MAX_TOKENS | number | 100000 | Token budget safety limit |
CLAUDE_MEM_OPENROUTER_SITE_URL | string | — | Optional: URL for analytics attribution |
CLAUDE_MEM_OPENROUTER_APP_NAME | string | claude-mem | Optional: App name for analytics |
Using the Settings UI
- Open the viewer at http://localhost:37777
- Click the gear icon to open Settings
- Under AI Provider, select OpenRouter
- Enter your OpenRouter API key
- Optionally select a different model
Settings are applied immediately—no restart required.
Manual Configuration
Edit ~/.claude-mem/settings.json:
{
"CLAUDE_MEM_PROVIDER": "openrouter",
"CLAUDE_MEM_OPENROUTER_API_KEY": "sk-or-v1-your-key-here",
"CLAUDE_MEM_OPENROUTER_MODEL": "xiaomi/mimo-v2-flash:free"
}
Alternatively, set the API key via environment variable:
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"
The settings file takes precedence over the environment variable.
Model Selection Guide
For Free Usage (No Cost)
Recommended: xiaomi/mimo-v2-flash:free
- Best-in-class performance on coding benchmarks
- 256K context window handles large observations
- 65K max completion tokens
- Mixture-of-experts architecture (15B active parameters)
Alternatives:
google/gemini-2.0-flash-exp:free - 1M context, Google’s flagship
deepseek/deepseek-r1:free - Excellent reasoning capabilities
meta-llama/llama-3.1-70b-instruct:free - Strong general purpose
For Paid Usage (Higher Quality/Speed)
| Model | Price (per 1M tokens) | Best For |
|---|
anthropic/claude-3.5-sonnet | 3in/15 out | Highest quality observations |
google/gemini-2.0-flash | 0.075in/0.30 out | Fast, cost-effective |
openai/gpt-4o | 2.50in/10 out | GPT-4 quality |
Context Window Management
OpenRouter agent implements intelligent context management to prevent runaway costs:
Automatic Truncation
The agent uses a sliding window strategy:
- Checks if message count exceeds
MAX_CONTEXT_MESSAGES (default: 20)
- Checks if estimated tokens exceed
MAX_TOKENS (default: 100,000)
- If limits exceeded, keeps most recent messages only
- Logs warnings with dropped message counts
Token Estimation
- Conservative estimate: 1 token ≈ 4 characters
- Used for proactive context management
- Actual usage logged from API response
Cost Tracking
Logs include detailed usage information:
OpenRouter API usage: {
model: "xiaomi/mimo-v2-flash:free",
inputTokens: 2500,
outputTokens: 1200,
totalTokens: 3700,
estimatedCostUSD: "0.00",
messagesInContext: 8
}
Provider Switching
You can switch between providers at any time:
- No restart required: Changes take effect on the next observation
- Conversation history preserved: When switching mid-session, the new provider sees the full conversation context
- Seamless transition: All providers use the same observation format
Switching via UI
- Open Settings in the viewer
- Change the AI Provider dropdown
- The next observation will use the new provider
Switching via Settings File
{
"CLAUDE_MEM_PROVIDER": "openrouter"
}
Fallback Behavior
If OpenRouter encounters errors, claude-mem automatically falls back to the Claude Agent SDK:
Triggers fallback:
- Rate limiting (HTTP 429)
- Server errors (HTTP 500, 502, 503)
- Network issues (connection refused, timeout)
- Generic fetch failures
Does not trigger fallback:
- Missing API key (logs warning, uses Claude from start)
- Invalid API key (fails with error)
When fallback occurs:
- A warning is logged
- Any in-progress messages are reset to pending
- Claude SDK takes over with the full conversation context
Fallback is transparent: Your observations continue processing without interruption. The fallback preserves all conversation context.
Multi-Turn Conversation Support
OpenRouter agent maintains full conversation history across API calls:
Session Created
↓
Load Pending Messages (observations from queue)
↓
For each message:
→ Add to conversation history
→ Call OpenRouter API with FULL history
→ Parse XML response
→ Store observations in database
→ Sync to Chroma vector DB
↓
Session complete
This enables:
- Coherent multi-turn exchanges
- Context preservation across observations
- Seamless provider switching mid-session
Troubleshooting
Either:
- Set
CLAUDE_MEM_OPENROUTER_API_KEY in ~/.claude-mem/settings.json, or
- Set the
OPENROUTER_API_KEY environment variable
Rate Limiting
Free models may have rate limits during peak usage. If you hit rate limits:
- Claude-mem automatically falls back to Claude SDK
- Consider switching to a different free model
- Add credits for premium model access
Model Not Found
Verify the model ID is correct:
- Check OpenRouter Models for current availability
- Use the
:free suffix for free model variants
- Model IDs are case-sensitive
High Token Usage Warning
If you see warnings about high token usage (>50,000 per request):
- Reduce
CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES
- Reduce
CLAUDE_MEM_OPENROUTER_MAX_TOKENS
- Consider a model with larger context window
Connection Errors
If you see connection errors:
- Check your internet connection
- Verify OpenRouter service status at status.openrouter.ai
- The agent will automatically fall back to Claude
API Details
OpenRouter uses an OpenAI-compatible REST API:
Endpoint: https://openrouter.ai/api/v1/chat/completions
Headers:
Authorization: Bearer {apiKey}
HTTP-Referer: https://github.com/thedotmack/claude-mem
X-Title: claude-mem
Content-Type: application/json
Request Format:
{
"model": "xiaomi/mimo-v2-flash:free",
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."}
],
"temperature": 0.3,
"max_tokens": 4096
}
Comparing Providers
| Feature | Claude (SDK) | Gemini | OpenRouter |
|---|
| Cost | Pay per token | Free tier + paid | Free models + paid |
| Models | Claude only | Gemini only | 100+ models |
| Quality | Highest | High | Varies by model |
| Rate limits | Based on tier | 5-4000 RPM | Varies by model |
| Fallback | N/A (primary) | → Claude | → Claude |
| Setup | Automatic | API key required | API key required |
Recommendation: Start with OpenRouter’s free xiaomi/mimo-v2-flash:free model for zero-cost observation extraction. If you need higher quality or encounter rate limits, switch to Claude or add OpenRouter credits for premium models.
Next Steps