Sanctum Firewalla

For three years the haus’s screen-time enforcement leaned on a ~1.5k-line Node.js bridge — ~/.openclaw/firewalla-bridge.js — that wrapped Firewalla’s encipher WebSocket SDK and exposed an HTTP surface on port 1984. It worked, and it worked defensively — dozens of try/catch blocks, retry ladders, and /health-honesty plumbing — but two production incidents in 2026-05 (the bridge-500 socket-stale episode and the 2026-05-30 partial-pause Connection-Reset that left an Apple TV reachable for 2.5 hours past curfew) exposed the cost of putting that much load-bearing logic in single-process JavaScript with zero test coverage. A bridge defended one try/catch at a time is only as honest as the developer who last counted them. The Rust port lives in the sanctum-firewalla crate of the sanctum-rs workspace, on the feat/firewalla-bridge-rust-port branch.
Why not “just rewrite it in Rust”
Section titled “Why not “just rewrite it in Rust””The Firewalla encipher SDK is JavaScript-only, undocumented at the protocol level, and reverse-engineering its WebSocket handshake into a Rust crate is a research project of its own. The realistic shape is therefore Rust HTTP+SSH bridge with a Node sidecar for the SDK: Rust owns ~95% of the logic (HTTP serving, bearer auth, SSH executor, discovery, persistent state, metrics, retry policy, request multiplexing), Node owns the ~5% that has to touch the SDK (a ~190-LOC sidecar process Rust spawns + drives over line-protocol JSON on stdin/stdout). The wire contract is in services/sanctum-firewalla/docs/SIDECAR_IPC.md — replaceable: when a Rust crate for encipher appears, the sidecar gets deleted and the sdk module dispatches calls directly.
The 24-endpoint contract
Section titled “The 24-endpoint contract”The Rust bridge is byte-compatible with the Node bridge on every endpoint screen-time and the sentinels call, save the two deliberate divergences called out below. The contract lives at services/sanctum-firewalla/docs/API_CONTRACT.md and is the load-bearing spec for the cutover. Highlights:
/healthis publicly accessible; everything else requires bearer auth (Authorization: Bearer <token>, token resolved at boot from$FIREWALLA_BRIDGE_TOKENenv then macOS keychain).- 24 endpoints total: host inventory + state, policy management, MAC pause, feature toggles, DNS, alarms, stats, flows, speedtest, export.
- Two intentional contract changes versus the Node bridge:
Policy already existsreturns HTTP 200 with{success: true, alreadyExists: true}instead of HTTP 500. The Node bridge treated Firewalla’s “this rule is already in place” reply as an error; every caller was already retry-looping over it; ~355 false-error log lines per day were the cost. The Rust port normalises this to a success-with-flag./pingis moved into the public (no-auth) subrouter alongside/health. In the Node bridge/pingsits behind the bearer gate; the Rust port treats it as a liveness probe like/health. This is a deliberate divergence, not a byte-for-byte port — flagged here so the cutover diff harness expects it instead of failing on it.
Contract drift notice (2026-06-11)
Section titled “Contract drift notice (2026-06-11)”The Node bridge — still the live deployment — grew past the 24-endpoint
contract in the June 10-11 sprint: QoS throttle/unthrottle (bedtime
wind-down), /host/:mac/app-usage, /system/qos, /policies/purge, plus
the seven dns-floor routes (feature/tag/network projections and scoped
policy writes). The Rust port must catch up to the Node bridge’s contract
before any cutover talk resumes. The Node file remains the source of truth;
this page’s count is historical.
The expire contract (learned the hard way)
Section titled “The expire contract (learned the hard way)”Firewalla’s policy.expire is a duration in seconds since activation,
not an epoch timestamp (Policy.js: activatedTime + expire < now). Send
an epoch and two things happen, both bad: the rule never expires naturally
(hello, 2,014-row corpse pile), and if you also set autoDeleteWhenExpires,
the box schedules setTimeout(epoch × 1000) — which overflows Node’s 32-bit
timer and fires the delete immediately. Your rule is born, assigned a
pid, and executed within the second, with nothing in any log. The bridge
converts at the boundary (expireEpochToDuration); callers keep sending
epochs and live happily ever after. Any future port MUST preserve this
conversion or rediscover it the way we did: at 1 a.m., with tcpdump.
The five-module decomposition (and the swarm that built it)
Section titled “The five-module decomposition (and the swarm that built it)”| Module | Owns | Tests |
|---|---|---|
auth | bearer-token resolution (env → keychain), require_bearer axum middleware with constant-time compare | 7 |
http | axum Router with all 24 endpoints, parameter extractors, success envelope, IntoResponse mapping | 4 full-stack (in tests/contract.rs) |
sdk | spawns + supervises the Node sidecar, multiplexes requests by request_id, restarts on 3 sdk_disconnect within 60s | 2 (incl. the multiplexer-with-out-of-order-responses test) |
ssh | russh executor (single persistent session), iptables-rule builder, tri-state dnsmasq conf inspector | 15 (incl. the 2026-05-18 --RESIDUAL-- sentinel preserved) |
discovery | TCP probe + mDNS + topology cache + rediscover state machine, populates /health | 6 |
state | AppState + atomic-rename persistence at ~/.openclaw/firewalla/state.json + the 3-strike supervisor signal | 10 |
The five modules were built in parallel by five isolated-worktree sub-agents in a single working session — apple-way + military-grade: each agent had API_CONTRACT.md + SIDECAR_IPC.md + its own module skeleton + tight definition-of-done; collisions were impossible (each owned one file), and the integration glue (lib.rs re-exports, sidecar handle wiring) was the only sequential work left for the integrator. 47 lib tests + 7 integration tests in tests/contract.rs (3 pure-router contract checks + 4 full-stack through the mock sidecar), zero ignored. The http module ships zero inline tests on purpose: its whole job is wiring, so it’s proven end-to-end through the router instead of unit-by-unit.
Resilience contract
Section titled “Resilience contract”Every endpoint upholds the same four guards:
- Discovery — on SDK init failure, the bridge re-discovers Firewalla IP via topology cache → mDNS service browse → mDNS hostnames → TCP probe of common gateway IPs (
192.0.2.1,192.168.2.1). - SSH fallback — every
policy:deleteandpolicy:createtries the SDK path first, then SSH iptables fallback if the SDK returns non-success. Mirrors the Node bridge’s ladder exactly. - SDK socket supervision — the
statemodule counts consecutivesdk_disconnectevents; thesdkmodule’s supervisor restarts the sidecar process after 3 within 60s. - Bounded timeouts — 12s outer on SDK calls, 8s on SSH commands. Exceeding either returns 504, not a hang.
The cutover (Phase 4)
Section titled “The cutover (Phase 4)”Where the port stands on the day of writing:
| Step | State |
|---|---|
| 1. Sidecar mock + full-stack tests | DONE |
2. Shadow run — Rust on :1985, Node on :1984, screen-time smoke harness diffs responses | SCHED — operator-gated |
3. Swap — Rust takes :1984, Node demoted to :1985 as the fall-back | SCHED — pending step 2 |
| 4. Seven-day soak; retire the Node bridge on clean operation | SCHED — pending step 3 |
The live ~/.openclaw/firewalla-bridge.js is untouched and stays so until step 3.
Versioning
Section titled “Versioning”- v0.1.0 — initial port. Endpoint-identical to Node bridge except the
Policy already existsnormalisation. - v1.0.0 — Node bridge retired.