The most important interface in Sanctum is the one the operator uses without thinking: a normal Signal thread with “Yoda” — same UX as texting a friend. Behind that thread sits a 35B-parameter Qwen running locally on the Mac Mini, an openclaw agent with the Jedi Master’s persona, and seven hops between the two.
When the chat is healthy, none of that is visible. When it isn’t, the silence is deafening — and we learned the hard way that silence has many causes.
The Holocron Bridge is the system that keeps the chat alive: the transport, the dispatcher, the agent, the watchdog, and a single CLI that gives you the truth in two seconds.
Every component shares the same Python SignalTransport — a single persistent TCP-JSON-RPC client to signal-cli. Reconnect with bounded exponential backoff, fail-fast on disconnect (so callers don’t hang), and one writer per process. No race between subscribers; no zombie sockets.
No silent failures
Every hop has a structured log. Every envelope is classified — data / receipt / typing / sync / other. Receipts and typing indicators are visibly logged so silence in the log reflects an actual silence, not a misparse. The Connection closed unexpectedly line that used to look terrifying is now a footnote — it’s signal-cli’s normal long-poll cycle.
Self-heal in 10 minutes
A yoda-chat-doctor.timer runs every 10 min on the VM. It stamps any unstamped workspace, archives any poisoned session, and restarts any dead service. On the Mac, plain launchd KeepAlive respawns com.sanctum.signal-cli, com.sanctum.signal-tcp-bridge, and com.sanctum.mlx, while the com.sanctum.signal-health sentinel kicks signal-cli when it wedges without dying. The chat heals itself before you notice.
Six rows, not seven — the gateway probe came out when the gateway came out of the path.
yoda-chat doctor walks the same checks but applies remediation:
Workspace state missing setupCompletedAt? Stamp it.
Session JSONL has two consecutive bootstrap-pending replies? Archive it.
Proxy or consumer down? systemctl --user restart.
yoda-chat e2e is the no-mercy regression test. It exercises every hop in order — signal-cli TCP version probe, proxy /api/v1/check, mlx /v1/models, an agent CLI round-trip that demands the word PONG, then (unless --no-deliver) a real proxy → operator send and an agent --deliver round-trip — and prints a green checkmark per stage with a duration. If it’s all green, the chat works. If not, the failing stage names the problem.
Hard-won lessons (the ones we never want to relearn)
Bundled openclaw extensions stage their npm deps on every load.
The VM is offline. Every load hung forever on npm install @mariozechner/pi-ai. Disabling each extension didn’t help (the staging path bypasses the deny list); the fix was to pre-populate sentinel package.json files in each extension’s node_modules/<dep>/ so the staging code believed the deps were already there. The runtime has since moved to a pnpm-managed store and the extensions/ dir is empty, so the sentinels are gone — but the lesson (stage, don’t fetch, on an air-gapped box) is the one we’d relearn the hard way.
signal-cli-rest-api Docker containers are retired.
Native signal-cli on TCP :7583 is the live transport. Don’t put a REST shim in front. The previous signal-proxy.js translated JSON-RPC ⇄ REST against dead Docker IPs (10.0.0.1:18081/18082). The new yoda-chat proxy forwards JSON-RPC verbatim to TCP — no translation, no impedance mismatch.
Session poisoning is the silent killer.
If the agent ever falls into a “Bootstrap is still pending” reply (because some workspace hasn’t been stamped), every subsequent ambiguous prompt continues that pattern. The model is doing the right thing — continuing the conversation. But the conversation is poisoned. The session JSONL has to be archived. yoda_chat.session.archive_poisoned_session_for() does this automatically; the doctor runs it on every tick.
setupCompletedAt must be set in every workspace.
Both the per-agent sandbox (sandboxes/agent-<id>-*/.openclaw/workspace-state.json) AND the runtime workspace (workspace/.openclaw/workspace-state.json), and the per-Jedi siblings: workspace-windu, workspace-cilghal, workspace-mundi, workspace-quigon. stamp_workspaces_complete() does all of them.
The 100ms WebSocket reconnect log is normal.
signal-cli polls Signal Server in short bursts; the Connection closed unexpectedly, reconnecting in 100 ms is the long-poll cycle completing, not a fault. Don’t restart the daemon on this signal.
pkill -f openclaw over SSH closes your own session.
The shell that’s running pkill matches the pattern. Use specific PIDs, pkill -x (exact match), or pkill -f <very-specific>.
When the chat goes silent — diagnose in 60 seconds
yoda-chat status from any host. Either Mac or VM. The first failing row names the broken layer.
If it’s all green but no reply lands: check the session row. If turns is high and Last out: starts with “Bootstrap is still pending”, the session is poisoned. yoda-chat doctor archives it.
If signal-cli TCP is red: the Mac daemon is down or the bridge is broken. On Mac: launchctl list | grep com.sanctum.signal should show com.sanctum.signal-cli and com.sanctum.signal-tcp-bridge. Kickstart with launchctl kickstart -k gui/$(id -u)/com.sanctum.signal-cli.
If signal-proxy is red but signal-cli TCP is green: the VM proxy is down. ssh ubuntu@10.0.0.10 systemctl --user restart yoda-chat-proxy. The doctor timer will catch this within 10 min anyway.
If mlx bridge is red: sanctum-mlx is down or the TLS-termination bridge isn’t running. The bridge is com.sanctum.yoda-plain-bridge (it terminates TLS at :1339 onto sanctum-mlx at :1337); launchd KeepAlive usually respawns it, but if it’s wedged, manual: launchctl kickstart -k gui/$(id -u)/com.sanctum.yoda-plain-bridge.
If everything is green and you still feel something is off: run yoda-chat e2e. It will send two test messages to the operator’s Signal — proof of life from every hop.