OpenRouter Provider

Claude-mem supports OpenRouter as an alternative provider for observation extraction. OpenRouter provides a unified API to access 100+ models from different providers including Google, Meta, Mistral, DeepSeek, and many others—often with generous free tiers.

Free Models Available: OpenRouter offers several completely free models, making it an excellent choice for reducing observation extraction costs to zero while maintaining quality.

Why Use OpenRouter?

Access to 100+ models: Choose from models across multiple providers through one API
Free tier options: Several high-quality models are completely free to use
Cost flexibility: Pay-as-you-go pricing on premium models with no commitments
Seamless fallback: Automatically falls back to Claude if OpenRouter is unavailable
Hot-swappable: Switch providers without restarting the worker
Multi-turn conversations: Full conversation history maintained across API calls

Free Models on OpenRouter

OpenRouter actively supports democratizing AI access by offering free models. These are production-ready models suitable for observation extraction.

Featured Free Models

Model	ID	Parameters	Context	Best For
Xiaomi MiMo-V2-Flash	`xiaomi/mimo-v2-flash:free`	309B (15B active, MoE)	256K	Reasoning, coding, agents
Gemini 2.0 Flash	`google/gemini-2.0-flash-exp:free`	—	1M	General purpose
Gemini 2.5 Flash	`google/gemini-2.5-flash-preview:free`	—	1M	Latest capabilities
DeepSeek R1	`deepseek/deepseek-r1:free`	671B	64K	Reasoning, analysis
Llama 3.1 70B	`meta-llama/llama-3.1-70b-instruct:free`	70B	128K	General purpose
Llama 3.1 8B	`meta-llama/llama-3.1-8b-instruct:free`	8B	128K	Fast, lightweight
Mistral Nemo	`mistralai/mistral-nemo:free`	12B	128K	Efficient performance

Default Model: Claude-mem uses xiaomi/mimo-v2-flash:free by default—a 309B parameter mixture-of-experts model that ranks #1 on SWE-bench Verified and excels at coding and reasoning tasks.

Free Model Considerations

Rate limits: Free models may have stricter rate limits than paid models
Availability: Free capacity depends on provider partnerships and demand
Queue times: During peak usage, requests may be queued briefly
Max tokens: Most free models support 65,536 completion tokens

All free models support:

Tool use and function calling
Temperature and sampling controls
Stop sequences
Streaming responses

Getting an API Key

Go to OpenRouter
Sign in with Google, GitHub, or email
Navigate to API Keys
Click Create Key
Copy and securely store your API key

Free to start: No credit card required to create an account or use free models. Add credits only if you want to use premium models.

Configuration

Settings

Setting	Values	Default	Description
`CLAUDE_MEM_PROVIDER`	`claude`, `gemini`, `openrouter`	`claude`	AI provider for observation extraction
`CLAUDE_MEM_OPENROUTER_API_KEY`	string	—	Your OpenRouter API key
`CLAUDE_MEM_OPENROUTER_MODEL`	string	`xiaomi/mimo-v2-flash:free`	Model identifier (see list above)
`CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES`	number	`20`	Max messages in conversation history
`CLAUDE_MEM_OPENROUTER_MAX_TOKENS`	number	`100000`	Token budget safety limit
`CLAUDE_MEM_OPENROUTER_SITE_URL`	string	—	Optional: URL for analytics attribution
`CLAUDE_MEM_OPENROUTER_APP_NAME`	string	`claude-mem`	Optional: App name for analytics

Using the Settings UI

Open the viewer at http://localhost:37777
Click the gear icon to open Settings
Under AI Provider, select OpenRouter
Enter your OpenRouter API key
Optionally select a different model

Settings are applied immediately—no restart required.

Manual Configuration

Edit ~/.claude-mem/settings.json:

{
  "CLAUDE_MEM_PROVIDER": "openrouter",
  "CLAUDE_MEM_OPENROUTER_API_KEY": "sk-or-v1-your-key-here",
  "CLAUDE_MEM_OPENROUTER_MODEL": "xiaomi/mimo-v2-flash:free"
}

Alternatively, set the API key via environment variable:

export OPENROUTER_API_KEY="sk-or-v1-your-key-here"

The settings file takes precedence over the environment variable.

Model Selection Guide

For Free Usage (No Cost)

Recommended: xiaomi/mimo-v2-flash:free

Best-in-class performance on coding benchmarks
256K context window handles large observations
65K max completion tokens
Mixture-of-experts architecture (15B active parameters)

Alternatives:

google/gemini-2.0-flash-exp:free - 1M context, Google’s flagship
deepseek/deepseek-r1:free - Excellent reasoning capabilities
meta-llama/llama-3.1-70b-instruct:free - Strong general purpose

For Paid Usage (Higher Quality/Speed)

Model	Price (per 1M tokens)	Best For
`anthropic/claude-3.5-sonnet`	$3 in /$ 15 out	Highest quality observations
`google/gemini-2.0-flash`	$0.075 in /$ 0.30 out	Fast, cost-effective
`openai/gpt-4o`	$2.50 in /$ 10 out	GPT-4 quality

Context Window Management

OpenRouter agent implements intelligent context management to prevent runaway costs:

Automatic Truncation

The agent uses a sliding window strategy:

Checks if message count exceeds MAX_CONTEXT_MESSAGES (default: 20)
Checks if estimated tokens exceed MAX_TOKENS (default: 100,000)
If limits exceeded, keeps most recent messages only
Logs warnings with dropped message counts

Token Estimation

Conservative estimate: 1 token ≈ 4 characters
Used for proactive context management
Actual usage logged from API response

Cost Tracking

Logs include detailed usage information:

OpenRouter API usage: {
  model: "xiaomi/mimo-v2-flash:free",
  inputTokens: 2500,
  outputTokens: 1200,
  totalTokens: 3700,
  estimatedCostUSD: "0.00",
  messagesInContext: 8
}

Provider Switching

You can switch between providers at any time:

No restart required: Changes take effect on the next observation
Conversation history preserved: When switching mid-session, the new provider sees the full conversation context
Seamless transition: All providers use the same observation format

Switching via UI

Open Settings in the viewer
Change the AI Provider dropdown
The next observation will use the new provider

Switching via Settings File

{
  "CLAUDE_MEM_PROVIDER": "openrouter"
}

Fallback Behavior

If OpenRouter encounters errors, claude-mem automatically falls back to the Claude Agent SDK: Triggers fallback:

Rate limiting (HTTP 429)
Server errors (HTTP 500, 502, 503)
Network issues (connection refused, timeout)
Generic fetch failures

Does not trigger fallback:

Missing API key (logs warning, uses Claude from start)
Invalid API key (fails with error)

When fallback occurs:

A warning is logged
Any in-progress messages are reset to pending
Claude SDK takes over with the full conversation context

Fallback is transparent: Your observations continue processing without interruption. The fallback preserves all conversation context.

Multi-Turn Conversation Support

OpenRouter agent maintains full conversation history across API calls:

Session Created
  ↓
Load Pending Messages (observations from queue)
  ↓
For each message:
  → Add to conversation history
  → Call OpenRouter API with FULL history
  → Parse XML response
  → Store observations in database
  → Sync to Chroma vector DB
  ↓
Session complete

This enables:

Coherent multi-turn exchanges
Context preservation across observations
Seamless provider switching mid-session

Troubleshooting

”OpenRouter API key not configured”

Either:

Set CLAUDE_MEM_OPENROUTER_API_KEY in ~/.claude-mem/settings.json, or
Set the OPENROUTER_API_KEY environment variable

Rate Limiting

Free models may have rate limits during peak usage. If you hit rate limits:

Claude-mem automatically falls back to Claude SDK
Consider switching to a different free model
Add credits for premium model access

Model Not Found

Verify the model ID is correct:

Check OpenRouter Models for current availability
Use the :free suffix for free model variants
Model IDs are case-sensitive

High Token Usage Warning

If you see warnings about high token usage (>50,000 per request):

Reduce CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES
Reduce CLAUDE_MEM_OPENROUTER_MAX_TOKENS
Consider a model with larger context window

Connection Errors

If you see connection errors:

Check your internet connection
Verify OpenRouter service status at status.openrouter.ai
The agent will automatically fall back to Claude

API Details

OpenRouter uses an OpenAI-compatible REST API: Endpoint: https://openrouter.ai/api/v1/chat/completions Headers:

Authorization: Bearer {apiKey}
HTTP-Referer: https://github.com/thedotmack/claude-mem
X-Title: claude-mem
Content-Type: application/json

Request Format:

{
  "model": "xiaomi/mimo-v2-flash:free",
  "messages": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."}
  ],
  "temperature": 0.3,
  "max_tokens": 4096
}

Comparing Providers

Feature	Claude (SDK)	Gemini	OpenRouter
Cost	Pay per token	Free tier + paid	Free models + paid
Models	Claude only	Gemini only	100+ models
Quality	Highest	High	Varies by model
Rate limits	Based on tier	5-4000 RPM	Varies by model
Fallback	N/A (primary)	→ Claude	→ Claude
Setup	Automatic	API key required	API key required

Recommendation: Start with OpenRouter’s free xiaomi/mimo-v2-flash:free model for zero-cost observation extraction. If you need higher quality or encounter rate limits, switch to Claude or add OpenRouter credits for premium models.

Next Steps

Configuration - Full settings reference
Gemini Provider - Alternative free provider
Getting Started - Basic usage guide
Troubleshooting - Common issues

Get Started

Cursor Integration

Best Practices

Configuration & Development

Architecture

OpenRouter Provider

OpenRouter Provider

Why Use OpenRouter?

Free Models on OpenRouter

Featured Free Models

Free Model Considerations

Getting an API Key

Configuration

Settings

Using the Settings UI

Manual Configuration

Model Selection Guide

For Free Usage (No Cost)

For Paid Usage (Higher Quality/Speed)

Context Window Management

Automatic Truncation

Token Estimation

Cost Tracking

Provider Switching

Switching via UI

Switching via Settings File

Fallback Behavior

Multi-Turn Conversation Support

Troubleshooting

”OpenRouter API key not configured”

Rate Limiting

Model Not Found

High Token Usage Warning

Connection Errors

API Details

Comparing Providers

Next Steps

Get Started

Cursor Integration

Best Practices

Configuration & Development

Architecture

​OpenRouter Provider

​Why Use OpenRouter?

​Free Models on OpenRouter

​Featured Free Models

​Free Model Considerations

​Getting an API Key

​Configuration

​Settings

​Using the Settings UI

​Manual Configuration

​Model Selection Guide

​For Free Usage (No Cost)

​For Paid Usage (Higher Quality/Speed)

​Context Window Management

​Automatic Truncation

​Token Estimation

​Cost Tracking

​Provider Switching

​Switching via UI

​Switching via Settings File

​Fallback Behavior

​Multi-Turn Conversation Support

​Troubleshooting

​”OpenRouter API key not configured”

​Rate Limiting

​Model Not Found

​High Token Usage Warning

​Connection Errors

​API Details

​Comparing Providers

​Next Steps

OpenRouter Provider

Why Use OpenRouter?

Free Models on OpenRouter

Featured Free Models

Free Model Considerations

Getting an API Key

Configuration

Settings

Using the Settings UI

Manual Configuration

Model Selection Guide

For Free Usage (No Cost)

For Paid Usage (Higher Quality/Speed)

Context Window Management

Automatic Truncation

Token Estimation

Cost Tracking

Provider Switching

Switching via UI

Switching via Settings File

Fallback Behavior

Multi-Turn Conversation Support

Troubleshooting

”OpenRouter API key not configured”

Rate Limiting

Model Not Found

High Token Usage Warning

Connection Errors

API Details

Comparing Providers

Next Steps