MetaMCP Session-Pinning Proxy
Status: shipped 2026-05-16 · Result: ≈1.7 GiB RAM saved on emmett at ~10 Claude sessions · Tracking: oleks/emmett#2 (closed), milestone oleks/emmett#36, umbrella oleks/emmett#13.
Problem: per-session MCP backend fan-out
metamcp aggregates the cluster plugin's stdio MCPs (gitea, kubernetes, prometheus, loki, woodpecker) behind one HTTP endpoint (127.0.0.1:12009). But:
- metamcp's
McpServerPoolkeys backend child processes by server-mintedsessionId(apps/backendsrc/lib/metamcp/mcp-server-pool.ts; HTTP layer mints vianew StreamableHTTPServerTransport({ sessionIdGenerator: randomUUID })when nomcp-session-idheader is present). - Claude Code never sends
mcp-session-id(upstream bug anthropics/claude-code#41836, open). - So every Claude Code session → a fresh random
sessionId→ its own dedicated backend child set.
Measured on emmett: ~10 sessions ≈ 75 procs / 2043 MiB RSS, almost entirely duplicate MCP children (one full gitea/k8s/prometheus/loki/woodpecker set per session, ≈192 MiB each).
Approaches ruled out — do not re-attempt
| Approach | Verdict |
|---|---|
Path A — client sends a stable Mcp-Session-Id so metamcp self-pools |
DEAD. metamcp 404s any client-chosen sessionId (it only looks up server-minted ids, never creates from a client value); and Claude Code drops the header anyway (#41836). |
| Bounded M-session pool (M backends, M≪N) | UNNECESSARY. Based on the false belief that one backend serializes concurrent calls. milestone oleks/emmett#35, closed. |
Patch metamcp internals (mcp-server-pool.ts) |
MOOT. There is no in-metamcp serialization point. metamcp + one session pipelines ≥10 concurrent tools/call (SDK protocol.js dispatches each request via Promise.resolve().then(...); stdio transport send() is fire-and-forget; per-request-id stream mapping). oleks/emmett#13, re-scoped. |
⚠️ Folklore warning. Issue history (oleks/emmett#2 comments 1149/1162, milestone #35, commit
48bf50f) once concluded the single-session/front-proxy approach was "fatal — 6 concurrent → 3× HTTP 502". That was a FALSE NEGATIVE — a cold-start test-harness artifact (see "The one real bug"), not metamcp serialization. Reconciliation evidence: oleks/emmett#2 comment 1189. Do not resurrect "single backend serializes / bounded pool needed."
The solution: M=1 single-session pinning proxy
A small localhost passthrough in front of metamcp:
- At startup, performs one
initialize→ captures metamcp's server-mintedmcp-session-id→notifications/initialized. - Stamps that one pinned sessionId onto every inbound connection.
- metamcp's existing per-session pool therefore reuses one backend child set for all Claude sessions.
Why it's correct:
- metamcp pools by server-minted sessionId — feed it one, get one backend set.
- A single metamcp session pipelines concurrent
tools/callend-to-end: proven with a rawgitea-mcpchild and through metamcp — 10 concurrent interleaved calls return in ≈2.5× a single-call latency (not 10×), out-of-order by id, 0/10 HTTP 502. - Routed upstreams are session-stateless (gitea/kubernetes/prometheus/loki/woodpecker; classified in oleks/emmett#5) so sharing one session across Claude sessions is correctness-safe. Stateful MCPs (e.g. chrome) are not behind metamcp and unaffected.
The one real bug: cold-start race
The only genuine defect the investigation found (everything else was the false negative): N concurrent first-contact initialize handshakes racing the proxy's lazy mint-once path and the 20 s background keepalive against a still-cold pinned session → the original misattributed "502 storm".
Fixed by:
- Pre-warm: the pinned session is established synchronously at process start, before the listener accepts any connection — the first client never triggers a lazy mint.
- Single-flight
_ensure_pinned: one dedicated establishment primitive (_establish_lock+_establish_donecondition); concurrent first-contact/remint callers wait once on the condition. N cold handshakes cost ONE upstream handshake. The handshake holds no lock — the steady-state hot path is unchanged (lock taken only to read sid/gen, released before the upstream call → concurrenttools/callstill pipeline lock-free). - Keepalive deferral: the probe skips its tick entirely if an establishment is in progress — it can never invalidate a session mid-establishment.
Implementation & wiring
The proxy and its service live in the oleks/emmett repo (it is host-local to emmett; metamcp itself is the packages/metamcp.nix here in flake-hub):
| Component | Location |
|---|---|
| Proxy | experiments/mcp-stdio-mux/proxy_session_pin.py (M=1; pre-warm + single-flight + keepalive deferral) — commit d4e14f5 |
| NixOS service | nixos/metamcp-session-proxy.nix — systemd unit, localhost :12011, After/Wants metamcp.service, M=1 (no poolSize) — commit dbc6aa4 |
| Cutover | oleks/claude-plugin-cluster .mcp.json url → http://127.0.0.1:12011/ — commit ac96018 |
metamcp (packages/metamcp.nix 2.4.22) and its store path were never modified — the proxy sits in front.
Operations
- Liveness:
curl -s -X POST http://127.0.0.1:12011/ -H 'Content-Type: application/json' -H 'Accept: application/json, text/event-stream' -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-06-18","capabilities":{},"clientInfo":{"name":"x","version":"1"}}}'→ HTTP 200 + an MCPinitializeresult. - Service:
systemctl status metamcp-session-proxy(andmetamcpmust be active — proxyWantsit). - Memory check: one pinned-session backend set ≈ 5 procs / ~192 MiB; you should see one set regardless of how many Claude sessions are running.
- Acceptance gate (for any future change): ≥8 concurrent
tools/callwith 0 HTTP 502, latency ≤ ~2× direct-metamcp baseline, sustained over minutes including a wave fired at service cold-start (the exact shape that produced the original false negative). - Scope: only Claude Code sessions started after the
.mcp.jsoncutover (ac96018) use the proxy; pre-existing long-running sessions stay on direct stdio until they restart (expected, no action).
Result
| Before (N-fan-out) | After (M=1 pinned) | |
|---|---|---|
| Backend procs @ ~10 sessions | ~75 | ~5 |
| RSS | ~2043 MiB | ~192 MiB |
| Saving | — | ≈1.7 GiB |
Sustained gate: 10 waves × 10 concurrent over 302 s — every wave 0 err / 0 HTTP 502; cold-start wave 0/10 502; sustained mean 3.2 s ≤ 4.6 s direct baseline; exactly 1 session mint across cold + 5-min sustained.
References
- Outcome & before/after: oleks/emmett#2 (closed, comment 1213)
- False-negative correction: oleks/emmett#2 comment 1189
- Productionization phases: milestone oleks/emmett#36 (#14 cold-start, #15 service, #16 gate, #17 cutover, #18 close-out), umbrella oleks/emmett#13
- Ruled-out bounded pool: milestone oleks/emmett#35 (closed)
- Upstream blocker: anthropics/claude-code#41836