Skip to main content

OpenRouter Provider

Claude-mem supports OpenRouter as an alternative provider for observation extraction. OpenRouter provides a unified API to access 100+ models from different providers including Google, Meta, Mistral, DeepSeek, and many others—often with generous free tiers.
Free Models Available: OpenRouter offers several completely free models, making it an excellent choice for reducing observation extraction costs to zero while maintaining quality.

Why Use OpenRouter?

  • Access to 100+ models: Choose from models across multiple providers through one API
  • Free tier options: Several high-quality models are completely free to use
  • Cost flexibility: Pay-as-you-go pricing on premium models with no commitments
  • Seamless fallback: Automatically falls back to Claude if OpenRouter is unavailable
  • Hot-swappable: Switch providers without restarting the worker
  • Multi-turn conversations: Full conversation history maintained across API calls

Free Models on OpenRouter

OpenRouter actively supports democratizing AI access by offering free models. These are production-ready models suitable for observation extraction.
ModelIDParametersContextBest For
Xiaomi MiMo-V2-Flashxiaomi/mimo-v2-flash:free309B (15B active, MoE)256KReasoning, coding, agents
Gemini 2.0 Flashgoogle/gemini-2.0-flash-exp:free1MGeneral purpose
Gemini 2.5 Flashgoogle/gemini-2.5-flash-preview:free1MLatest capabilities
DeepSeek R1deepseek/deepseek-r1:free671B64KReasoning, analysis
Llama 3.1 70Bmeta-llama/llama-3.1-70b-instruct:free70B128KGeneral purpose
Llama 3.1 8Bmeta-llama/llama-3.1-8b-instruct:free8B128KFast, lightweight
Mistral Nemomistralai/mistral-nemo:free12B128KEfficient performance
Default Model: Claude-mem uses xiaomi/mimo-v2-flash:free by default—a 309B parameter mixture-of-experts model that ranks #1 on SWE-bench Verified and excels at coding and reasoning tasks.

Free Model Considerations

  • Rate limits: Free models may have stricter rate limits than paid models
  • Availability: Free capacity depends on provider partnerships and demand
  • Queue times: During peak usage, requests may be queued briefly
  • Max tokens: Most free models support 65,536 completion tokens
All free models support:
  • Tool use and function calling
  • Temperature and sampling controls
  • Stop sequences
  • Streaming responses

Getting an API Key

  1. Go to OpenRouter
  2. Sign in with Google, GitHub, or email
  3. Navigate to API Keys
  4. Click Create Key
  5. Copy and securely store your API key
Free to start: No credit card required to create an account or use free models. Add credits only if you want to use premium models.

Configuration

Settings

SettingValuesDefaultDescription
CLAUDE_MEM_PROVIDERclaude, gemini, openrouterclaudeAI provider for observation extraction
CLAUDE_MEM_OPENROUTER_API_KEYstringYour OpenRouter API key
CLAUDE_MEM_OPENROUTER_MODELstringxiaomi/mimo-v2-flash:freeModel identifier (see list above)
CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGESnumber20Max messages in conversation history
CLAUDE_MEM_OPENROUTER_MAX_TOKENSnumber100000Token budget safety limit
CLAUDE_MEM_OPENROUTER_SITE_URLstringOptional: URL for analytics attribution
CLAUDE_MEM_OPENROUTER_APP_NAMEstringclaude-memOptional: App name for analytics

Using the Settings UI

  1. Open the viewer at http://localhost:37777
  2. Click the gear icon to open Settings
  3. Under AI Provider, select OpenRouter
  4. Enter your OpenRouter API key
  5. Optionally select a different model
Settings are applied immediately—no restart required.

Manual Configuration

Edit ~/.claude-mem/settings.json:
{
  "CLAUDE_MEM_PROVIDER": "openrouter",
  "CLAUDE_MEM_OPENROUTER_API_KEY": "sk-or-v1-your-key-here",
  "CLAUDE_MEM_OPENROUTER_MODEL": "xiaomi/mimo-v2-flash:free"
}
Alternatively, set the API key via environment variable:
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"
The settings file takes precedence over the environment variable.

Model Selection Guide

For Free Usage (No Cost)

Recommended: xiaomi/mimo-v2-flash:free
  • Best-in-class performance on coding benchmarks
  • 256K context window handles large observations
  • 65K max completion tokens
  • Mixture-of-experts architecture (15B active parameters)
Alternatives:
  • google/gemini-2.0-flash-exp:free - 1M context, Google’s flagship
  • deepseek/deepseek-r1:free - Excellent reasoning capabilities
  • meta-llama/llama-3.1-70b-instruct:free - Strong general purpose

For Paid Usage (Higher Quality/Speed)

ModelPrice (per 1M tokens)Best For
anthropic/claude-3.5-sonnet3in/3 in / 15 outHighest quality observations
google/gemini-2.0-flash0.075in/0.075 in / 0.30 outFast, cost-effective
openai/gpt-4o2.50in/2.50 in / 10 outGPT-4 quality

Context Window Management

OpenRouter agent implements intelligent context management to prevent runaway costs:

Automatic Truncation

The agent uses a sliding window strategy:
  1. Checks if message count exceeds MAX_CONTEXT_MESSAGES (default: 20)
  2. Checks if estimated tokens exceed MAX_TOKENS (default: 100,000)
  3. If limits exceeded, keeps most recent messages only
  4. Logs warnings with dropped message counts

Token Estimation

  • Conservative estimate: 1 token ≈ 4 characters
  • Used for proactive context management
  • Actual usage logged from API response

Cost Tracking

Logs include detailed usage information:
OpenRouter API usage: {
  model: "xiaomi/mimo-v2-flash:free",
  inputTokens: 2500,
  outputTokens: 1200,
  totalTokens: 3700,
  estimatedCostUSD: "0.00",
  messagesInContext: 8
}

Provider Switching

You can switch between providers at any time:
  • No restart required: Changes take effect on the next observation
  • Conversation history preserved: When switching mid-session, the new provider sees the full conversation context
  • Seamless transition: All providers use the same observation format

Switching via UI

  1. Open Settings in the viewer
  2. Change the AI Provider dropdown
  3. The next observation will use the new provider

Switching via Settings File

{
  "CLAUDE_MEM_PROVIDER": "openrouter"
}

Fallback Behavior

If OpenRouter encounters errors, claude-mem automatically falls back to the Claude Agent SDK: Triggers fallback:
  • Rate limiting (HTTP 429)
  • Server errors (HTTP 500, 502, 503)
  • Network issues (connection refused, timeout)
  • Generic fetch failures
Does not trigger fallback:
  • Missing API key (logs warning, uses Claude from start)
  • Invalid API key (fails with error)
When fallback occurs:
  1. A warning is logged
  2. Any in-progress messages are reset to pending
  3. Claude SDK takes over with the full conversation context
Fallback is transparent: Your observations continue processing without interruption. The fallback preserves all conversation context.

Multi-Turn Conversation Support

OpenRouter agent maintains full conversation history across API calls:
Session Created

Load Pending Messages (observations from queue)

For each message:
  → Add to conversation history
  → Call OpenRouter API with FULL history
  → Parse XML response
  → Store observations in database
  → Sync to Chroma vector DB

Session complete
This enables:
  • Coherent multi-turn exchanges
  • Context preservation across observations
  • Seamless provider switching mid-session

Troubleshooting

”OpenRouter API key not configured”

Either:
  • Set CLAUDE_MEM_OPENROUTER_API_KEY in ~/.claude-mem/settings.json, or
  • Set the OPENROUTER_API_KEY environment variable

Rate Limiting

Free models may have rate limits during peak usage. If you hit rate limits:
  • Claude-mem automatically falls back to Claude SDK
  • Consider switching to a different free model
  • Add credits for premium model access

Model Not Found

Verify the model ID is correct:
  • Check OpenRouter Models for current availability
  • Use the :free suffix for free model variants
  • Model IDs are case-sensitive

High Token Usage Warning

If you see warnings about high token usage (>50,000 per request):
  • Reduce CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES
  • Reduce CLAUDE_MEM_OPENROUTER_MAX_TOKENS
  • Consider a model with larger context window

Connection Errors

If you see connection errors:
  • Check your internet connection
  • Verify OpenRouter service status at status.openrouter.ai
  • The agent will automatically fall back to Claude

API Details

OpenRouter uses an OpenAI-compatible REST API: Endpoint: https://openrouter.ai/api/v1/chat/completions Headers:
Authorization: Bearer {apiKey}
HTTP-Referer: https://github.com/thedotmack/claude-mem
X-Title: claude-mem
Content-Type: application/json
Request Format:
{
  "model": "xiaomi/mimo-v2-flash:free",
  "messages": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."}
  ],
  "temperature": 0.3,
  "max_tokens": 4096
}

Comparing Providers

FeatureClaude (SDK)GeminiOpenRouter
CostPay per tokenFree tier + paidFree models + paid
ModelsClaude onlyGemini only100+ models
QualityHighestHighVaries by model
Rate limitsBased on tier5-4000 RPMVaries by model
FallbackN/A (primary)→ Claude→ Claude
SetupAutomaticAPI key requiredAPI key required
Recommendation: Start with OpenRouter’s free xiaomi/mimo-v2-flash:free model for zero-cost observation extraction. If you need higher quality or encounter rate limits, switch to Claude or add OpenRouter credits for premium models.

Next Steps