Skip to content

Roadmap

Tommy at the drafting table — shipped features in ink, Phase 2 in deliberate pencil

This page is no longer a graveyard of half-remembered ambition. The audit phase is complete enough that the roadmap can be narrower and more honest.

Sanctum’s core feature set is mechanically proven. The remaining work is not “make the Living Force real.” The remaining work is making Sanctum reproducible, legible, and releaseable without depending on one Mac Mini and one operator’s memory.

There is one deliberate gate before that work resumes: a one-week stabilization window. See Stability Window. If Sanctum cannot remain boring for seven days, it has not earned fresh ambition.

For current verified system shape, see Operational State. For the evidence behind the feature claims, see Feature Status Matrix. For the implementation split across workspace, runtime, and supporting repos, see Implementation Audit and Runtime Drift Audit.

  • now means active Phase 2 infrastructure work
  • next means queued immediately after the current items
  • later means valuable, but not a prerequisite for productization
  • not planned means intentionally parked until reality argues otherwise

Sanctum is now mechanically stable as a single-operator system, but it is not yet productized in the stronger sense of being portable, repeatable, and supportable without inheriting Bert’s machine shape.

  • Stability: 8/10
    The runtime, audit wall, calibration checks, and live-system probes now behave like real infrastructure instead of aspirational architecture.
  • Recoverability: 7.5/10
    Self-heal paths are real and tested, but some of that resilience still depends on bespoke restart scripts and machine-local supervision conventions.
  • Reproducibility: 3.5/10
    Too much of the system still assumes /Users/neo, local keychain state, ~/Projects/* repos, launchd behavior, and adjacent private runtime surfaces.
  • Productization readiness: 4.5/10
    Sanctum is ready for continued hardening on the current machine and maybe a tightly controlled second-machine rollout, but not yet for external beta users or a supportable install story.

The practical conclusion is simple: the feature surface is no longer the bottleneck. The bottleneck is turning a highly capable personal runtime into a portable operator product.

One-Week Stability Window

Freeze the architecture and let it behave. Phase 2 work stays blocked until Sanctum survives a seven-day soak with the audit wall, runtime checks, and docs build still clean.

Deliverables: stability policy, soak state tracking, exit gate, one-week calm runtime.

Portable Bootstrap

Turn sanctumctl from a strong operator surface into a genuinely reproducible install path. A fresh compatible machine should be able to bootstrap the checked-in slice, render manifests, sync calibration artifacts, and run the audit wall without hidden shell lore.

Deliverables: install profile, prerequisite checks, bootstrap docs, one-command verify path.

Environment Resolver

Separate machine-specific state from the core product surface. Paths, host capabilities, LaunchAgent assumptions, and runtime conventions need a clearer boundary so Sanctum can target more than one personal setup without pretending every machine is identical.

Deliverables: resolved host profile, explicit machine overlays, fewer hardwired assumptions in adjacent repos.

Release Discipline

Make every release rerunnable instead of ceremonial. The current test wall is strong; the next step is one release-grade entrypoint that runs audits, runtime checks, docs build, and generates a coherent release note from the verified state.

Deliverables: release script or workflow, version stamp, audit summary, docs build gate.

sanctum-cli v0.8 — public install path

sanctum-cli is productization-ready as of v0.7.1 (12 top-level commands, 4 cloud backends, 201 / 201 tests, R2 default with $0 egress, GitHub Tier 0, sanctum onboard --recipe family --yes runs end-to-end in ~3-5 minutes). v0.8 is packaging + polish so people who aren’t Bertrand can install it.

Deliverables: brew formula + homebrew-sanctum tap, Sigstore-signed releases with verified self-update, atomic-replace flow for cloud_backup, public README + demo recording, sanctum dashboard TUI for the wow-factor demo. Target: May.

Public Demo Path

Build the reveal layer. Sanctum needs a clean, intentional demo path that shows the parts worth believing: watchdog self-heal, Code Forge rollback, Jocasta context retrieval, Tech Lookout dispatch, and Force Flow delivery. The point is not theatrics. The point is making the real system legible.

Deliverables: scripted walkthrough, stable demo fixtures, screenshots or clips, one page tying the sequence together.

Cross-Repo Contract Cleanup

Reduce implicit coupling across sanctum, ~/.sanctum, openclaw-skills, and the support repos. Adjacent systems should declare what Sanctum expects from them instead of relying on shared memory and ambient convention.

Deliverables: clearer interface docs, fewer surprise path assumptions, explicit ownership boundaries.

Kitchen Loop Coverage Expansion

The Kitchen Loop core is now implemented in the workspace slice. The next work is expansion: deeper L4 state-delta checks, broader scenario coverage, and tighter runtime attachment for real haushold automations and memory drift controls.

Why later: the mechanism exists; the remaining work is coverage depth rather than foundational productization.

Monorepo or Workspace Consolidation

There is still a plausible future where more of the Rust and support tooling lives in a tighter workspace. That may improve release ergonomics, but it is a secondary move. First the system needs clearer contracts; then it can choose a cleaner home.

Why later: consolidation without stronger boundaries just relocates confusion.

Context-Aware Automation Expansion

Features like Frigate-backed alert semantics and richer haushold automation remain appealing. They are real product features, but they are downstream of making Sanctum easier to install, verify, and trust.

Why later: new features should follow product discipline, not precede it.

Qwen3-TTS — Local Yoda Voice

Done 2026-04-19. Qwen3-TTS via mlx-audio replaces XTTS-v2 for Yoda’s voice agent. The XTTS plist is now .retired on disk; Qwen3-TTS runs as com.sanctum.yoda-tts-worker on :8008 (workers.tts_server Python module). Zero-shot voice cloning from reference audio. Lower memory pressure than XTTS at equal quality.

Firewalla Gold Pro Migration

Export current Firewalla Purple config and migrate to Gold Pro. Export/import scripts exist (tools/firewalla-export.sh, tools/firewalla-import.sh). Bridge has GET /export endpoint. SSH iptables fallback handles the current firmware’s broken policy:delete API — verify this is still needed on Gold Pro firmware.

Why later: Purple still works. Migration when hardware arrives.

Full Local Voice Pipeline

Replace all cloud dependencies in the voice agent: Deepgram STT → local Whisper via mlx-audio, Claude LLM → local Qwen3.6-35B-A3B-4bit via sanctum-mlx. Combined with Qwen3-TTS, this yields a zero-cloud voice pipeline. Phone calls to Yoda with no data leaving the haus.

Why later: each component needs to be fast enough individually first. Qwen3-TTS speed is the current bottleneck.

Council fallback — restored 2026-04-25

The dead council-guardian.sh::activate_fallback() automation was retired (script went 257 → 203 lines, three structural reasons documented). The replacement lives in the Smart Router cathedral where it always belonged: com.sanctum.mlx-py-fallback plist runs Python mlx_lm.server with mlx-community/Qwen3.5-9B-4bit on 127.0.0.1:8901 (plain HTTP loopback — mTLS is sanctum-server’s job at :8900), and services.sanctum_server.backends.council-secure.fallback_urls lists http://127.0.0.1:8901/v1 after the existing MBP shadow URL.

End-to-end drill on Mini (2026-04-25): Rust primary warm 0.51 s; failover (Rust booted out) cold-load 75 s on first request, 1.07 s warm thereafter; Rust restored 0.51 s. Quality: "2 + 2 equals 4." from Rust, "2 + 2 = **4**" from Python — both correct, minor stylistic difference (different model).

Memory cost: ~5 GB resident (Python holds the 9B model warm). Mini swap settled at ~10 GB after the cache cap was applied — ~/.sanctum/bin/mlx-server-with-caps.py wraps mlx_lm.server and calls mx.set_cache_limit(1 GB) + mx.clear_cache() before launch (mlx_lm.server has no equivalent flag and MLX honors no env var for this). Verified bounded: swap stayed flat at 10.85 GB across a 5-request burst. The wrapper is the same pattern as the Rust binary’s --metal-cache-limit-mb flag.

Receipts: project_failover_drill_2026_04_25.md in agent memory.

Gemma 4 (and other archs) in sanctum-mlx

Today sanctum-mlx::LoadedModel dispatches qwen3_5 and qwen3_5_moe only — anything else fails at config parse and falls back to Python mlx_lm.server. A gemma3.rs loader (Gemma 3/4 share the same Rust shape) plus a third LoadedModel::Gemma3 variant would land Gemma on the same fast path Qwen rides. Estimated 1–2 days of focused work. The fused TurboQuant attention kernel would need a small adapter — FullAttention::forward integration currently lives in qwen3_5.rs.

The honest reason it’s later, not next: this is vendor diversification and abstraction maturity, not the resilience win it sounds like. The shared substrate (MLX runtime, Metal driver, custom kernels, launchd, mTLS) is what actually fails — a second loader doesn’t help when the kernel takes both models down. The Python fallback already covers “the Rust path is broken” for any model upstream mlx-lm supports, including Gemma. Build it when there’s a concrete workload reason (Gemma 4 LoRAs that want fast serving, or a council member that needs Gemma’s tokenizer), not for generic resilience.

The actually-resilience-improving moves come first: exercise the Python fallback regularly (a drill is worth more than a duplicate loader), document the failover decision points, harden the shared kernel layer.

Holocron for iPad — Satellite Command

A native iPadOS app that turns any iPad into a full Sanctum control surface. Not a web view in a frame — a real app that feels like it belongs on the device.

Vision: Dock an iPad at any satellite location (chalet, office, cabin) and control the entire galaxy. Voice input via on-device mic, camera feeds, climate zones, agent conversations, security dashboard — all through one interface that works offline when Tailscale drops and syncs when it reconnects.

Core features:

  • Voice-first: wake word, local speech-to-text, response via AirPlay to HomePods
  • Dashboard: Camera feeds (Blink, Ring), HVAC zones, alarm status, agent health
  • Agent chat: Talk to any Jedi directly — routes to the right model via Smart Router
  • Offline mode: Full functionality on local network without internet
  • Multi-site: Switch between Manoir, Chalet, and any future satellite with one tap
  • Kiosk mode: Guided Access lockdown for always-on kitchen/hallway display

Tech stack: SwiftUI + local MLX inference + Home Assistant WebSocket + Sanctum API

Why Phase 3: The infrastructure (Smart Router, Model Tournament, 3-tier routing) must be solid before building a consumer surface on top. Phase 2 made the foundation trustworthy. Phase 3 makes it beautiful.

  • Full abstraction away from local state. Sanctum is still a system with opinions about the machine it inhabits.
  • Pretending every private operator detail should become public product surface.
  • Rewriting the architecture pages to sound less personal. The voice is part of the system.

Phase 2 is complete when these statements are true:

  • a fresh compatible machine can bootstrap Sanctum from checked-in inputs with bounded manual setup
  • machine-specific assumptions are explicit instead of ambient
  • one release path runs the audits, runtime checks, and docs build as a coherent gate
  • the docs explain both the architecture and the operator workflow without relying on private memory
  • a public observer can understand what Sanctum proves without reading the entire repo constellation

Completed feature work no longer lives here. That material now belongs in:

Completed operator guidance now lives in:

That is intentional. A roadmap should describe remaining work, not cosplay as a changelog.