Engineering Discipline

Engineering Discipline — the security fortress where every commit is tested and every key is rotated.

Engineering Discipline

Date: 2026-04-11 Status: Active

Running AI agents in your haus is either professional engineering or professional negligence. There is no middle ground. The same system that controls your kid’s screen time also holds your API keys, your health data, and the credentials to your fund’s deal flow pipeline. If that sentence doesn’t make you want to write tests, nothing will.

This page documents the standards that keep Sanctum from becoming a cautionary tale — the kind forwarded around Hacker News with subject lines like “man lets AI agents manage home, discovers API key on public GitHub three weeks later.” The rule is simple: if it’s not tested, it doesn’t exist. The corollary is the one this page keeps honest — if it’s deleted, it doesn’t get to claim coverage either.

Test Coverage

Every shipping component carries its own suite, and every suite runs independently. No untested code ships. No exceptions. No “we’ll add tests later.” Later is where bugs go to breed.

The components that matter are the ones still on disk. The table below tracks what actually exists and runs today, not the ghosts of training experiments past.

Component	Suite	What It Verifies
Cloud Proxy (`proxyd`)	`services/sanctum-proxy/test_e2e_proxy.sh`	Release build, binary present, `:4040` health responds
Sanctum Olympics	`sanctum-olympics/test_olympics.py`	Config loading, task YAML validation, judge parsing, mock e2e
Secret Rotation	`~/.sanctum/scripts/secret-rotation.sh --full`	6-phase sweep (see below)
Docs Quality	`scripts/contrib-check.py`	Frontmatter, haus rule, emoji ban, `<digit` escapes, hero presence
Build Release	`tests/test_build_release.sh`	Prerequisites, binary, metallib, dist packaging, deploy flags

# Run a suite against the thing it actually tests — locally, no SSH
bash ~/Projects/sanctum-rs/services/sanctum-proxy/test_e2e_proxy.sh   # proxyd e2e
python3 ~/Projects/sanctum-olympics/test_olympics.py                  # olympics
bash  ~/.sanctum/scripts/secret-rotation.sh --full                    # secret sweep
python3 ~/Documents/Claude_Code/sanctum-docs/scripts/contrib-check.py src/content/docs/architecture/engineering-discipline.mdx

Every suite can be run by a human who just woke up and doesn’t remember what day it is. That is a design requirement, not an accident — and the same discipline is why the rows that lost their code lost their place in the table.

Secret Management

Zero hardcoded API keys. Anywhere. Verified across every repo in both GitHub organizations — Ogilthorp3 and Triptyq-Capital, 57 repos and counting. The count keeps climbing; the invariant does not move. This is the line in the sand.

The API key was hardcoded in a Python file pushed to a public GitHub repo. For three weeks. Nobody noticed. The key had full API access. The bill was… educational. That was the last time a secret touched source control, and the infrastructure that prevents it from ever happening again is aggressive by design.

Check	Status
Hardcoded keys in source files	0 (all repos, both machines)
Hardcoded keys in public repos	0 (Ogilthorp3 + Triptyq-Capital)
API keys in git history	Neutralized (old keys revoked)
OpenRouter key storage	macOS Keychain (both machines)
OpenRouter Management key	macOS Keychain (for programmatic rotation)
Key rotation	Zero-touch via `tools/secret-rotator/rotate.py`
Secret scanning	6-phase sweep via `secret-rotation.sh --full`

Key Rotation Architecture

macOS Keychain
  ├── openrouter-api-key      → Used by proxyd + benchmarks
  ├── openrouter-mgmt-key     → Used by rotate.py (admin-only)
  └── anthropic-api-key       → Used by coding-llm-bench

tools/secret-rotator/rotate.py:
  1. Mint new value     (via provider Management API)
  2. Validate + verify  (shape check, then a real test call)
  3. Propagate          (Keychain + SOPS vault, every write backed up)
  4. Revoke old value   (via provider Management API)
  Zero browser. Zero human. Zero downtime.

The rotator doesn’t ask for permission. It mints the new value, proves it works, propagates it to the Keychain and the SOPS vault, and kills the old one — and because it validates the shape and aborts before propagation if verify fails, a half-rotated secret never reaches a service. If you’re still rotating API keys by logging into a dashboard and clicking buttons, you’re doing it wrong — and more importantly, you’re doing it rarely, which means your keys are old, which means the blast radius when one leaks is measured in months.

Infrastructure Quality

This is a dated snapshot of a single cleanup sweep — measured 2026-04-11, not a live invariant. Some of it drifts back on purpose: every .bak-pre-X plist is a deliberate rollback point from the safe-edit ritual, so the disabled-file count breathes in and out as services get touched. The zeros below were true the day the broom came out; the discipline is re-running the broom, not pretending the dust never returns.

Metric (2026-04-11 sweep)	Before	After
LaunchAgents: disabled/backup files	23	0
Shell scripts without `set -euo pipefail`	13	0
Rust compiler warnings	4	0
Stale adapter directories	51 GB	1.9 GB (2 active)
CSS `!important` overrides	36	0
Inline CSS styles in MDX	9	0
Legacy image path conventions	55	0
`instance.yaml` stale services	5+	0 (all annotated)

Every row in that table represents a class of problem that existed, was measured, and was eliminated. The 23 disabled LaunchAgent files were ghosts from previous configurations — not hurting anything, not helping anything, just cluttering the namespace and making it harder to tell what was real. The 13 shell scripts without set -euo pipefail were time bombs waiting for a failed command to silently continue. The 51 GB of stale adapters were eleven training experiments that nobody cleaned up because nobody knew which ones were active.

Container Doctrine

Most of Sanctum runs natively — Rust binaries, Python services, Node sidecars, all under launchd. When we do reach for a container, it is for software we did not write (Home Assistant, Outline, Postgres, Redis) and which we cannot meaningfully shrink. Those upstream images are 2.67 GB and 1.23 GB respectively, and they stay that way.

For our own containers the default is Chainguard wolfi-base plus a multi-stage build. Build the artifact in a stage with the full toolchain, then copy just the binary and its runtime deps into a wolfi-base final stage. The popular guidance — node:22-slim → node:22-alpine → gcr.io/distroless/nodejs22-debian12 — is the right direction, but it stops one tier short. wolfi-base is built rolling with Wolfi packages, ships with glibc (not musl, no surprise libpython symbol mismatches), and carries a smaller CVE surface than distroless because the apk package graph is curated for image-only use.

Layer	Default
Build stage	`cgr.dev/chainguard/wolfi-base:latest` + `apk add --no-cache <toolchain>`
Runtime stage	`cgr.dev/chainguard/wolfi-base:latest` + only the runtime libs
`.dockerignore`	Required — never copy `node_modules`, `__pycache__`, `.venv`, tests, dev configs
User	A dedicated non-root user, dropped to with `USER <name>` as the last instruction — never root. Shannon uses `adduser -u 1001 -G pentest` then `USER pentest`
Healthcheck	One process check; do not curl an internal HTTP endpoint that depends on the network stack being up

Shannon (~/Projects/shannon/Dockerfile) is the canonical precedent in this workspace — wolfi-base builder and runtime, multi-stage, no apt-cache leftovers. Use it as the reference when adding any new custom container.

Cloud Proxy Hardening

The cloud proxy (proxyd, the Rust daemon at ~/Projects/sanctum-rs/services/sanctum-proxy, on :4040) sits between the council and the upstream providers. It started life as a Python script and is a Rust binary now — the usual Sanctum arc: Python while a thing is feature-organic, Rust once it’s load-bearing. It exists because giving six AI agents unlimited access to a pay-per-token cloud API is how you wake up to a surprise invoice. Five layers of protection:

Layer	What It Does
Key security	`keychain.rs` reads `PROVIDER_API_KEY` from the launchd env first, falls back to the macOS login Keychain — never in code or git
Fallback chain	Data-driven, from the `fallbacks:` map in `config.yaml` — not hardcoded
Cost control	Per-agent daily USD caps (`budget.rs`): `$2.00` Windu and Mothma, `$0.50` everyone else
Audit trail	Every call logged to `~/.sanctum/logs/cloud-proxy-audit.jsonl`: agent, model, tokens, cost, latency
Rate limiting	Budget exhaustion trips a 503 before the request ever reaches a provider

The fallback chain is the interesting part, and the interesting part is that it’s data. Each primary model names its understudies in config.yaml — council-brain falls to council-code (the qwen2.5-coder-14b-instruct seat on :1234), which falls to council-27b, and so on down the map. proxy.rs walks that chain at request time; no recompile to reorder it. When every link is exhausted, the proxy returns an honest error instead of a hallucinated response. Graceful degradation, not graceful lying.

The Standard

Every commit follows the same discipline:

Code — write it
Test — prove it works
Document — explain why it exists
Scan — verify no secrets leaked
Push — ship it

If any step fails, the commit doesn’t ship. This is not optional. This is not aspirational. This is how the haus runs.

Because the only thing worse than an AI agent making a mistake is one making a mistake with your API keys, your health data, and your kid’s screen time schedule all in the same system — and nobody finding out until the credit card bill arrives. The discipline isn’t paranoia. It’s the minimum viable responsibility for running this kind of infrastructure in a place where people sleep.