Sanctum Cloud Proxy

When your haus talks to the cloud, it shouldn’t just ask for an API key and hope for the best. The original sanctum-cloud-proxy was a ~100-line Python script that did the security checks by hand: fetching keys, tracking spend, managing fallbacks. It worked, but it kept the Python interpreter chained to our most critical networking path — every autonomous action waited on an interpreter start.
So we folded the whole hardening suite into the existing sanctum-proxy Rust crate. The Python proxy is gone; the native daemon (proxyd, launchd label com.sanctum.proxyd) does it all.
How It Works
Section titled “How It Works”proxyd binds 0.0.0.0:4040 — all interfaces, so it answers on both 127.0.0.1 and the 10.0.0.1 Mac↔VM bridge. It intercepts every LLM API request and runs it through five layers of verification before forwarding to the cloud (Anthropic, OpenRouter, Gemini) or routing it to a local MLX seat.

The 5 Layers of Hardening
Section titled “The 5 Layers of Hardening”- Key Security — No API keys live in source,
.envfiles, or Git. At runtimekeychain.rsreads<PROVIDER>_API_KEYfrom the environment first (the launchd wrapperproxyd-launchinjectsGEMINI_API_KEY/OPENROUTER_API_KEYfrom~/.sanctum/secrets/), then falls back to the macOS login keychain via thesecurity-frameworkcrate (service=<provider>-api-key, account=sanctum). A daemon must never block on a consent dialog, so a missing key fails closed — it logs once and returns empty, letting the request emit a clean auth error instead of popping an AppleScript prompt in the operator’s session at 3 a.m. - Fallback Chain — If a provider returns a 429, a 5xx, or times out,
proxydwalks a per-model fallback chain (thefallbacks:map inconfig.yaml) until something healthy answers — the client never learns the first hop failed. The chain bottoms out on local MLX seats: theCodestral-22B-v0.1-4bitcoder on:3301and the cathedral council seat (qwen3.6-35b-a3b-4bit) on:1337mTLS. The cloud can be on fire; yourgit commitmessage still gets written. Subscription-first routing: the Claude-backed seats (council-brain= Jocasta and Mon Mothma,council-max-thinking= Yoda and Mundi) lead with the best subscription Opus —claude-opus-4, served as Opus 4.8 through the Claude Max bridge on:3456— and fall back to the cathedral, never to metered OpenRouter. An earlier chain spilled those seats to OpenRouter when Opus throttled under concurrent council load; with the OpenRouter balance dry that surfaced as402seat dropouts, so their fallbacks were rerouted to local-only (council-mlx, then the Codestral lane). - Cost Control (Daily Caps) — The proxy tracks real-time USD spend in memory with per-agent daily caps baked into
budget.rs(windu=$2.00,mothma=$2.00,jocasta=$1.50, everyone else$0.50). Blow the cap and cloud routes silently drop off your menu, folding you onto local inference. The agent that overspends doesn’t get an error — it gets a quieter model. - Audit Trail & Observability — Every request appends a structured line to
~/.sanctum/logs/cloud-proxy-audit.jsonl— agent, model, tokens, cost, latency, running daily total. The/statsendpoint on:4040serves the live tally as JSON, so “who spent what today” is onecurlaway, not agrepthrough a log. - Provider Health Gating — There is no global rate limiter; what
proxydactually keeps is a sliding-window health score per provider inbudget.rs(ProviderHealth: success rate + average latency, aged by exponential decay). Cross 30% errors and a provider is marked unhealthy and skipped in favour of the next link in the chain — routing around a flaky upstream instead of hammering it.
Why Rust?
Section titled “Why Rust?”LLM routing sits on the critical path for every autonomous action in the Sanctum. The Python implementation added measurable latency on every hop — HTTP parsing, interpreter context switches, the requests import tax.
By folding the hardening logic into the sanctum-proxy Rust binary:
- We dropped the Python interpreter and the
requestslibrary entirely — the resident set is now in the low tens of MB instead of the Python baseline. - Concurrency improved sharply under the
tokioasync runtime; SSE streams from many agents share one process without thread-per-request bloat. - Smart intent routing and security hardening now live in one repo,
sanctum-rs, instead of straddling two languages.
Configuration & Dynamic Healing
Section titled “Configuration & Dynamic Healing”proxyd loads its initial settings from ~/.sanctum/sanctum-proxy/config.yaml, but the config is live, not load-once:
- Hot reloading: edits to
config.yamlare caught by thenotifycrate’s filesystem watcher and swapped into memory viaarc-swap, so a new fallback chain or model entry takes effect without dropping an in-flight SSE stream. You retune routing without bouncing the daemon — and without the agent mid-essay noticing. - Independent accounting: the proxy doesn’t trust the agent to report its own model. It resolves identity from headers or endpoint routing and meters the actual tokens the upstream returned. An agent that asks for a cheap model to dodge its cap still gets billed for what it really burned.
Post-Quantum Transport
Section titled “Post-Quantum Transport”The :4040 front door is born post-quantum. Every TLS handshake negotiates the hybrid X25519MLKEM768 key exchange — classical X25519 plus ML-KEM-768 — so an adversary recording today’s traffic to decrypt once a quantum computer arrives harvests nothing usable. The inner proxyd-to-cathedral hop on :1337 enforces the same group with no classical fallback. Confirm it on the wire with ~/.sanctum/scripts/pqc-status.sh, which asserts the negotiated group per hop and refuses to run on a LibreSSL stack that would lie green.
The signature half — migrating cert trust from classical ECDSA-P256 to ML-DSA-65 (FIPS 204) — is prepared and proven, not yet cut over. The rustls-post-quantum provider swaps are committed on feature branches (KEX preservation re-asserted so the swap can’t silently downgrade), the staging ML-DSA-65 PKI verifies, and the gateway’s real stack — Node 22 on OpenSSL 3.5.5 — was confirmed to verify an ML-DSA chain end to end. The live cutover is held behind a council-set checklist in ~/.sanctum/runbooks/PQC-PHASE4-STATUS-2026-06-18.md. The reasoning: the key-exchange hedge already closed the only harvest-now exposure, so signatures — which defend against a live quantum man-in-the-middle, a threat with no present actor — ship deliberately, not fast.