Clone
1
MetaMCP Session-Pinning Proxy
oleks edited this page 2026-05-16 17:15:47 +03:00
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

MetaMCP Session-Pinning Proxy

Status: shipped 2026-05-16 · Result: ≈1.7 GiB RAM saved on emmett at ~10 Claude sessions · Tracking: oleks/emmett#2 (closed), milestone oleks/emmett#36, umbrella oleks/emmett#13.

Problem: per-session MCP backend fan-out

metamcp aggregates the cluster plugin's stdio MCPs (gitea, kubernetes, prometheus, loki, woodpecker) behind one HTTP endpoint (127.0.0.1:12009). But:

  • metamcp's McpServerPool keys backend child processes by server-minted sessionId (apps/backend src/lib/metamcp/mcp-server-pool.ts; HTTP layer mints via new StreamableHTTPServerTransport({ sessionIdGenerator: randomUUID }) when no mcp-session-id header is present).
  • Claude Code never sends mcp-session-id (upstream bug anthropics/claude-code#41836, open).
  • So every Claude Code session → a fresh random sessionId → its own dedicated backend child set.

Measured on emmett: ~10 sessions ≈ 75 procs / 2043 MiB RSS, almost entirely duplicate MCP children (one full gitea/k8s/prometheus/loki/woodpecker set per session, ≈192 MiB each).

Approaches ruled out — do not re-attempt

Approach Verdict
Path A — client sends a stable Mcp-Session-Id so metamcp self-pools DEAD. metamcp 404s any client-chosen sessionId (it only looks up server-minted ids, never creates from a client value); and Claude Code drops the header anyway (#41836).
Bounded M-session pool (M backends, M≪N) UNNECESSARY. Based on the false belief that one backend serializes concurrent calls. milestone oleks/emmett#35, closed.
Patch metamcp internals (mcp-server-pool.ts) MOOT. There is no in-metamcp serialization point. metamcp + one session pipelines ≥10 concurrent tools/call (SDK protocol.js dispatches each request via Promise.resolve().then(...); stdio transport send() is fire-and-forget; per-request-id stream mapping). oleks/emmett#13, re-scoped.

⚠️ Folklore warning. Issue history (oleks/emmett#2 comments 1149/1162, milestone #35, commit 48bf50f) once concluded the single-session/front-proxy approach was "fatal — 6 concurrent → 3× HTTP 502". That was a FALSE NEGATIVE — a cold-start test-harness artifact (see "The one real bug"), not metamcp serialization. Reconciliation evidence: oleks/emmett#2 comment 1189. Do not resurrect "single backend serializes / bounded pool needed."

The solution: M=1 single-session pinning proxy

A small localhost passthrough in front of metamcp:

  1. At startup, performs one initialize → captures metamcp's server-minted mcp-session-idnotifications/initialized.
  2. Stamps that one pinned sessionId onto every inbound connection.
  3. metamcp's existing per-session pool therefore reuses one backend child set for all Claude sessions.

Why it's correct:

  • metamcp pools by server-minted sessionId — feed it one, get one backend set.
  • A single metamcp session pipelines concurrent tools/call end-to-end: proven with a raw gitea-mcp child and through metamcp — 10 concurrent interleaved calls return in ≈2.5× a single-call latency (not 10×), out-of-order by id, 0/10 HTTP 502.
  • Routed upstreams are session-stateless (gitea/kubernetes/prometheus/loki/woodpecker; classified in oleks/emmett#5) so sharing one session across Claude sessions is correctness-safe. Stateful MCPs (e.g. chrome) are not behind metamcp and unaffected.

The one real bug: cold-start race

The only genuine defect the investigation found (everything else was the false negative): N concurrent first-contact initialize handshakes racing the proxy's lazy mint-once path and the 20 s background keepalive against a still-cold pinned session → the original misattributed "502 storm".

Fixed by:

  • Pre-warm: the pinned session is established synchronously at process start, before the listener accepts any connection — the first client never triggers a lazy mint.
  • Single-flight _ensure_pinned: one dedicated establishment primitive (_establish_lock + _establish_done condition); concurrent first-contact/remint callers wait once on the condition. N cold handshakes cost ONE upstream handshake. The handshake holds no lock — the steady-state hot path is unchanged (lock taken only to read sid/gen, released before the upstream call → concurrent tools/call still pipeline lock-free).
  • Keepalive deferral: the probe skips its tick entirely if an establishment is in progress — it can never invalidate a session mid-establishment.

Implementation & wiring

The proxy and its service live in the oleks/emmett repo (it is host-local to emmett; metamcp itself is the packages/metamcp.nix here in flake-hub):

Component Location
Proxy experiments/mcp-stdio-mux/proxy_session_pin.py (M=1; pre-warm + single-flight + keepalive deferral) — commit d4e14f5
NixOS service nixos/metamcp-session-proxy.nix — systemd unit, localhost :12011, After/Wants metamcp.service, M=1 (no poolSize) — commit dbc6aa4
Cutover oleks/claude-plugin-cluster .mcp.json urlhttp://127.0.0.1:12011/ — commit ac96018

metamcp (packages/metamcp.nix 2.4.22) and its store path were never modified — the proxy sits in front.

Operations

  • Liveness: curl -s -X POST http://127.0.0.1:12011/ -H 'Content-Type: application/json' -H 'Accept: application/json, text/event-stream' -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-06-18","capabilities":{},"clientInfo":{"name":"x","version":"1"}}}' → HTTP 200 + an MCP initialize result.
  • Service: systemctl status metamcp-session-proxy (and metamcp must be active — proxy Wants it).
  • Memory check: one pinned-session backend set ≈ 5 procs / ~192 MiB; you should see one set regardless of how many Claude sessions are running.
  • Acceptance gate (for any future change): ≥8 concurrent tools/call with 0 HTTP 502, latency ≤ ~2× direct-metamcp baseline, sustained over minutes including a wave fired at service cold-start (the exact shape that produced the original false negative).
  • Scope: only Claude Code sessions started after the .mcp.json cutover (ac96018) use the proxy; pre-existing long-running sessions stay on direct stdio until they restart (expected, no action).

Result

Before (N-fan-out) After (M=1 pinned)
Backend procs @ ~10 sessions ~75 ~5
RSS ~2043 MiB ~192 MiB
Saving ≈1.7 GiB

Sustained gate: 10 waves × 10 concurrent over 302 s — every wave 0 err / 0 HTTP 502; cold-start wave 0/10 502; sustained mean 3.2 s ≤ 4.6 s direct baseline; exactly 1 session mint across cold + 5-min sustained.

References