Hosted Server (Beta)
The hosted server is the cloud side of claude-mem: a Postgres-backed HTTP service (/v1) plus a separate BullMQ generation worker. Where the local plugin keeps
memory in ~/.claude-mem/claude-mem.db on your machine, the hosted server keeps
it per team and per project in Postgres, and exposes it back to any MCP
client over an authenticated link.
Three capabilities landed together and are documented here:
Remote MCP recall
Paste an authenticated link into Claude Code to recall your cloud memory —
read-only, team/project-scoped.
Paid-readiness
Opt-in rate limiting, monthly request/token quotas, and usage metering —
the guards a paid tier needs.
Data deletion
Right-to-erasure: forget a single memory, or purge everything captured for a
project.
The shape of the system
(team_id, project_id). An API key carries a team
(always) and an optional project scope; that scoping bounds every read,
write, and delete.
Authentication
SetCLAUDE_MEM_AUTH_MODE=api-key and send Authorization: Bearer <key> on every
request. Scopes gate access:
- Read endpoints (search, context, recall, usage) require
memories:read. - Write endpoints (ingest, key issuance, deletion) require
memories:write.
api_keys table; the raw cm_... value
is shown exactly once, at mint time.
Remote authenticated MCP recall
/v1/mcp is a streamable-HTTP MCP server. It’s
the secure link a user pastes into Claude Code to recall their cloud memory. It is
read-only and authenticated by the same API key as the REST routes
(memories:read); the key’s team — and project, if the key is project-scoped —
bounds every read.
| Tool | Arguments | Returns |
|---|---|---|
search | { projectId, query, limit? } | Matching observations (full-text search). |
context | { projectId, query, limit? } | Observations plus a concatenated context string ready for prompt injection. |
recent | { projectId, limit? } | The newest observations for a project. |
The transport is stateless — one MCP server + transport per request — so it
needs no session affinity behind a load balancer. Mutating tools are
intentionally absent: a pasted recall link can never write or delete. Every read
is written to
audit_log as an observation.read event, the same as
POST /v1/search.Connecting a client: key issuance + connect
Two routes turn “I have a server” into “Claude Code is recalling my cloud memory”:-
POST /v1/keys(requiresmemories:write) mints a read-only API key for the caller’s team and returns a paste-ready connect command. The raw key appears once. Body:{ "expiresInDays"?: number }. Minting requires write scope so a read-only key can’t escalate itself into more keys. -
GET /v1/connect(requiresmemories:read) returns the same command with a<YOUR_API_KEY>placeholder — a GET never mints. ThemcpUrlis built fromCLAUDE_MEM_PUBLIC_URL(recommended when behind a proxy or load balancer) or, failing that, the request host.
Paid-readiness: rate limiting, quotas, metering
These guards run after auth and are opt-in via environment variables. Unset (the default) means no rate limit, no quota, and no metering — behavior is identical to a server without them. Every guard fails open: a backing-store error never blocks a legitimate request.| Env var | Effect | Response when exceeded |
|---|---|---|
CLAUDE_MEM_RATE_LIMIT_PER_MIN | Max requests per API key per minute. | 429 with Retry-After and X-RateLimit-* headers. |
CLAUDE_MEM_MONTHLY_REQUEST_CAP | Max requests per team per calendar month (UTC). | 402 quota_exceeded. |
CLAUDE_MEM_MONTHLY_TOKEN_CAP | Max provider tokens per team per month. Gates writes only — reads stay open so a team over budget can still recall. | 402 at the cap. |
CLAUDE_MEM_USAGE_METERING=1 | Records one request usage event per authenticated call (fire-and-forget). | — |
usage_events table from
the generation worker, so usage reflects real provider spend, not just HTTP calls.
GET /v1/usage returns the caller team’s per-kind totals for the current month:
“Gates writes only” is deliberate: ingestion is what drives generation, which is
what costs tokens. A team that blows its token budget can still read its
existing memory — you never lock someone out of their own data over billing.
Data deletion (forget)
Right-to-erasure. Both routes requirememories:write and are scoped to the
caller’s team. Both write an audit_log entry.
-
DELETE /v1/memories/:id— delete a single observation; itsobservation_sourcescascade. Returns404if no such observation exists for the team. Audited asobservation.deleted. -
DELETE /v1/projects/:projectId/memory— purge all captured content for a project in one transaction: observations, raw agent events, server sessions, and generation jobs. The project shell (config/membership) is kept so the team can keep using it. Returns per-tablecounts. Returns404if the project doesn’t belong to the team. Audited asproject.memory_purged.
Deletion is team-scoped at the SQL layer, so a key can only ever erase its own
team’s data — a cross-team or nonexistent
projectId returns 404 rather than a
misleading success.Event generation semantics
Ingestion (POST /v1/events) accepts two query flags that control observation
generation:
generate=false— write the event but do not enqueue a generation job.wait=true— return thegenerationJobdescriptor so callers can pollGET /v1/jobs/:idfor completion.
wait=true, the response includes the new event row plus a best-effort
generationJob field. With wait=true, that field is always populated (or null
only when generation was explicitly disabled). The actual provider call happens in
the separate BullMQ worker (claude-mem server worker start) — the HTTP path
never blocks on a provider response.
Endpoint reference
All endpoints are mounted under/v1; legacy worker routes remain under /api.
What’s solid vs. what’s coming
Solid today: Postgres-backed multi-tenant storage, api-key auth with
read/write scopes, the
/v1/mcp recall link, opt-in rate limiting + quotas +
metering, and audited data deletion. All covered by the Postgres-gated e2e suite.Still being built (UX / devex): a web dashboard for the first-key bootstrap and
key management, self-serve onboarding, a billing/plan UI on top of the metering
primitives, and a smoother “connect Claude Code to my cloud memory” flow than
pasting a CLI command. These are the next focus — the primitives above are the
foundation they’ll sit on.
