Skip to content

Sanctum Cloud Proxy

Sanctum Cloud Proxy — A massive glowing secure gateway vault with five reinforced energy barriers protecting a data stream.

When your haus talks to the cloud, it shouldn’t just ask for an API key and hope for the best. The original sanctum-cloud-proxy was a ~100-line Python script that did the security checks by hand: fetching keys, tracking spend, managing fallbacks. It worked, but it kept the Python interpreter chained to our most critical networking path — every autonomous action waited on an interpreter start.

So we folded the whole hardening suite into the existing sanctum-proxy Rust crate. The Python proxy is gone; the native daemon (proxyd, launchd label com.sanctum.proxyd) does it all.

proxyd binds 0.0.0.0:4040 — all interfaces, so it answers on both 127.0.0.1 and the 10.0.0.1 Mac↔VM bridge. It intercepts every LLM API request and runs it through five layers of verification before forwarding to the cloud (Anthropic, OpenRouter, Gemini) or routing it to a local MLX seat.

A technical sketch of a five-layer vault shielding a central data node.

  1. Key Security — No API keys live in source, .env files, or Git. At runtime keychain.rs reads <PROVIDER>_API_KEY from the environment first (the launchd wrapper proxyd-launch injects GEMINI_API_KEY / OPENROUTER_API_KEY from ~/.sanctum/secrets/), then falls back to the macOS login keychain via the security-framework crate (service=<provider>-api-key, account=sanctum). A daemon must never block on a consent dialog, so a missing key fails closed — it logs once and returns empty, letting the request emit a clean auth error instead of popping an AppleScript prompt in the operator’s session at 3 a.m.
  2. Fallback Chain — If a provider returns a 429, a 5xx, or times out, proxyd walks a per-model fallback chain (the fallbacks: map in config.yaml) until something healthy answers — the client never learns the first hop failed. The chain bottoms out on local MLX seats: the Codestral-22B-v0.1-4bit coder on :3301 and the cathedral council seat (qwen3.6-35b-a3b-4bit) on :1337 mTLS. The cloud can be on fire; your git commit message still gets written. Subscription-first routing: the Claude-backed seats (council-brain = Jocasta and Mon Mothma, council-max-thinking = Yoda and Mundi) lead with the best subscription Opus — claude-opus-4, served as Opus 4.8 through the Claude Max bridge on :3456 — and fall back to the cathedral, never to metered OpenRouter. An earlier chain spilled those seats to OpenRouter when Opus throttled under concurrent council load; with the OpenRouter balance dry that surfaced as 402 seat dropouts, so their fallbacks were rerouted to local-only (council-mlx, then the Codestral lane).
  3. Cost Control (Daily Caps) — The proxy tracks real-time USD spend in memory with per-agent daily caps baked into budget.rs (windu=$2.00, mothma=$2.00, jocasta=$1.50, everyone else $0.50). Blow the cap and cloud routes silently drop off your menu, folding you onto local inference. The agent that overspends doesn’t get an error — it gets a quieter model.
  4. Audit Trail & Observability — Every request appends a structured line to ~/.sanctum/logs/cloud-proxy-audit.jsonl — agent, model, tokens, cost, latency, running daily total. The /stats endpoint on :4040 serves the live tally as JSON, so “who spent what today” is one curl away, not a grep through a log.
  5. Provider Health Gating — There is no global rate limiter; what proxyd actually keeps is a sliding-window health score per provider in budget.rs (ProviderHealth: success rate + average latency, aged by exponential decay). Cross 30% errors and a provider is marked unhealthy and skipped in favour of the next link in the chain — routing around a flaky upstream instead of hammering it.

LLM routing sits on the critical path for every autonomous action in the Sanctum. The Python implementation added measurable latency on every hop — HTTP parsing, interpreter context switches, the requests import tax.

By folding the hardening logic into the sanctum-proxy Rust binary:

  • We dropped the Python interpreter and the requests library entirely — the resident set is now in the low tens of MB instead of the Python baseline.
  • Concurrency improved sharply under the tokio async runtime; SSE streams from many agents share one process without thread-per-request bloat.
  • Smart intent routing and security hardening now live in one repo, sanctum-rs, instead of straddling two languages.

proxyd loads its initial settings from ~/.sanctum/sanctum-proxy/config.yaml, but the config is live, not load-once:

  • Hot reloading: edits to config.yaml are caught by the notify crate’s filesystem watcher and swapped into memory via arc-swap, so a new fallback chain or model entry takes effect without dropping an in-flight SSE stream. You retune routing without bouncing the daemon — and without the agent mid-essay noticing.
  • Independent accounting: the proxy doesn’t trust the agent to report its own model. It resolves identity from headers or endpoint routing and meters the actual tokens the upstream returned. An agent that asks for a cheap model to dodge its cap still gets billed for what it really burned.

The :4040 front door is born post-quantum. Every TLS handshake negotiates the hybrid X25519MLKEM768 key exchange — classical X25519 plus ML-KEM-768 — so an adversary recording today’s traffic to decrypt once a quantum computer arrives harvests nothing usable. The inner proxyd-to-cathedral hop on :1337 enforces the same group with no classical fallback. Confirm it on the wire with ~/.sanctum/scripts/pqc-status.sh, which asserts the negotiated group per hop and refuses to run on a LibreSSL stack that would lie green.

The signature half — migrating cert trust from classical ECDSA-P256 to ML-DSA-65 (FIPS 204) — is prepared and proven, not yet cut over. The rustls-post-quantum provider swaps are committed on feature branches (KEX preservation re-asserted so the swap can’t silently downgrade), the staging ML-DSA-65 PKI verifies, and the gateway’s real stack — Node 22 on OpenSSL 3.5.5 — was confirmed to verify an ML-DSA chain end to end. The live cutover is held behind a council-set checklist in ~/.sanctum/runbooks/PQC-PHASE4-STATUS-2026-06-18.md. The reasoning: the key-exchange hedge already closed the only harvest-now exposure, so signatures — which defend against a live quantum man-in-the-middle, a threat with no present actor — ship deliberately, not fast.