Skip to content

Engineering Discipline

Engineering Discipline — the security fortress where every commit is tested and every key is rotated.

Date: 2026-04-11 Status: Active

Running AI agents in your haus is either professional engineering or professional negligence. There is no middle ground. The same system that controls your kid’s screen time also holds your API keys, your health data, and the credentials to your fund’s deal flow pipeline. If that sentence doesn’t make you want to write tests, nothing will.

This page documents the standards that keep Sanctum from becoming a cautionary tale — the kind forwarded around Hacker News with subject lines like “man lets AI agents manage home, discovers API key on public GitHub three weeks later.” The rule is simple: if it’s not tested, it doesn’t exist. The corollary is the one this page keeps honest — if it’s deleted, it doesn’t get to claim coverage either.

Every shipping component carries its own suite, and every suite runs independently. No untested code ships. No exceptions. No “we’ll add tests later.” Later is where bugs go to breed.

The components that matter are the ones still on disk. The table below tracks what actually exists and runs today, not the ghosts of training experiments past.

ComponentSuiteWhat It Verifies
Cloud Proxy (proxyd)services/sanctum-proxy/test_e2e_proxy.shRelease build, binary present, :4040 health responds
Sanctum Olympicssanctum-olympics/test_olympics.pyConfig loading, task YAML validation, judge parsing, mock e2e
Secret Rotation~/.sanctum/scripts/secret-rotation.sh --full6-phase sweep (see below)
Docs Qualityscripts/contrib-check.pyFrontmatter, haus rule, emoji ban, <digit escapes, hero presence
Build Releasetests/test_build_release.shPrerequisites, binary, metallib, dist packaging, deploy flags
Terminal window
# Run a suite against the thing it actually tests — locally, no SSH
bash ~/Projects/sanctum-rs/services/sanctum-proxy/test_e2e_proxy.sh # proxyd e2e
python3 ~/Projects/sanctum-olympics/test_olympics.py # olympics
bash ~/.sanctum/scripts/secret-rotation.sh --full # secret sweep
python3 ~/Documents/Claude_Code/sanctum-docs/scripts/contrib-check.py src/content/docs/architecture/engineering-discipline.mdx

Every suite can be run by a human who just woke up and doesn’t remember what day it is. That is a design requirement, not an accident — and the same discipline is why the rows that lost their code lost their place in the table.

Zero hardcoded API keys. Anywhere. Verified across every repo in both GitHub organizations — Ogilthorp3 and Triptyq-Capital, 57 repos and counting. The count keeps climbing; the invariant does not move. This is the line in the sand.

The API key was hardcoded in a Python file pushed to a public GitHub repo. For three weeks. Nobody noticed. The key had full API access. The bill was… educational. That was the last time a secret touched source control, and the infrastructure that prevents it from ever happening again is aggressive by design.

CheckStatus
Hardcoded keys in source files0 (all repos, both machines)
Hardcoded keys in public repos0 (Ogilthorp3 + Triptyq-Capital)
API keys in git historyNeutralized (old keys revoked)
OpenRouter key storagemacOS Keychain (both machines)
OpenRouter Management keymacOS Keychain (for programmatic rotation)
Key rotationZero-touch via tools/secret-rotator/rotate.py
Secret scanning6-phase sweep via secret-rotation.sh --full
macOS Keychain
├── openrouter-api-key → Used by proxyd + benchmarks
├── openrouter-mgmt-key → Used by rotate.py (admin-only)
└── anthropic-api-key → Used by coding-llm-bench
tools/secret-rotator/rotate.py:
1. Mint new value (via provider Management API)
2. Validate + verify (shape check, then a real test call)
3. Propagate (Keychain + SOPS vault, every write backed up)
4. Revoke old value (via provider Management API)
Zero browser. Zero human. Zero downtime.

The rotator doesn’t ask for permission. It mints the new value, proves it works, propagates it to the Keychain and the SOPS vault, and kills the old one — and because it validates the shape and aborts before propagation if verify fails, a half-rotated secret never reaches a service. If you’re still rotating API keys by logging into a dashboard and clicking buttons, you’re doing it wrong — and more importantly, you’re doing it rarely, which means your keys are old, which means the blast radius when one leaks is measured in months.

This is a dated snapshot of a single cleanup sweep — measured 2026-04-11, not a live invariant. Some of it drifts back on purpose: every .bak-pre-X plist is a deliberate rollback point from the safe-edit ritual, so the disabled-file count breathes in and out as services get touched. The zeros below were true the day the broom came out; the discipline is re-running the broom, not pretending the dust never returns.

Metric (2026-04-11 sweep)BeforeAfter
LaunchAgents: disabled/backup files230
Shell scripts without set -euo pipefail130
Rust compiler warnings40
Stale adapter directories51 GB1.9 GB (2 active)
CSS !important overrides360
Inline CSS styles in MDX90
Legacy image path conventions550
instance.yaml stale services5+0 (all annotated)

Every row in that table represents a class of problem that existed, was measured, and was eliminated. The 23 disabled LaunchAgent files were ghosts from previous configurations — not hurting anything, not helping anything, just cluttering the namespace and making it harder to tell what was real. The 13 shell scripts without set -euo pipefail were time bombs waiting for a failed command to silently continue. The 51 GB of stale adapters were eleven training experiments that nobody cleaned up because nobody knew which ones were active.

Most of Sanctum runs natively — Rust binaries, Python services, Node sidecars, all under launchd. When we do reach for a container, it is for software we did not write (Home Assistant, Outline, Postgres, Redis) and which we cannot meaningfully shrink. Those upstream images are 2.67 GB and 1.23 GB respectively, and they stay that way.

For our own containers the default is Chainguard wolfi-base plus a multi-stage build. Build the artifact in a stage with the full toolchain, then copy just the binary and its runtime deps into a wolfi-base final stage. The popular guidance — node:22-slimnode:22-alpinegcr.io/distroless/nodejs22-debian12 — is the right direction, but it stops one tier short. wolfi-base is built rolling with Wolfi packages, ships with glibc (not musl, no surprise libpython symbol mismatches), and carries a smaller CVE surface than distroless because the apk package graph is curated for image-only use.

LayerDefault
Build stagecgr.dev/chainguard/wolfi-base:latest + apk add --no-cache <toolchain>
Runtime stagecgr.dev/chainguard/wolfi-base:latest + only the runtime libs
.dockerignoreRequired — never copy node_modules, __pycache__, .venv, tests, dev configs
UserA dedicated non-root user, dropped to with USER <name> as the last instruction — never root. Shannon uses adduser -u 1001 -G pentest then USER pentest
HealthcheckOne process check; do not curl an internal HTTP endpoint that depends on the network stack being up

Shannon (~/Projects/shannon/Dockerfile) is the canonical precedent in this workspace — wolfi-base builder and runtime, multi-stage, no apt-cache leftovers. Use it as the reference when adding any new custom container.

The cloud proxy (proxyd, the Rust daemon at ~/Projects/sanctum-rs/services/sanctum-proxy, on :4040) sits between the council and the upstream providers. It started life as a Python script and is a Rust binary now — the usual Sanctum arc: Python while a thing is feature-organic, Rust once it’s load-bearing. It exists because giving six AI agents unlimited access to a pay-per-token cloud API is how you wake up to a surprise invoice. Five layers of protection:

LayerWhat It Does
Key securitykeychain.rs reads PROVIDER_API_KEY from the launchd env first, falls back to the macOS login Keychain — never in code or git
Fallback chainData-driven, from the fallbacks: map in config.yaml — not hardcoded
Cost controlPer-agent daily USD caps (budget.rs): $2.00 Windu and Mothma, $0.50 everyone else
Audit trailEvery call logged to ~/.sanctum/logs/cloud-proxy-audit.jsonl: agent, model, tokens, cost, latency
Rate limitingBudget exhaustion trips a 503 before the request ever reaches a provider

The fallback chain is the interesting part, and the interesting part is that it’s data. Each primary model names its understudies in config.yamlcouncil-brain falls to council-code (the qwen2.5-coder-14b-instruct seat on :1234), which falls to council-27b, and so on down the map. proxy.rs walks that chain at request time; no recompile to reorder it. When every link is exhausted, the proxy returns an honest error instead of a hallucinated response. Graceful degradation, not graceful lying.

Every commit follows the same discipline:

  1. Code — write it
  2. Test — prove it works
  3. Document — explain why it exists
  4. Scan — verify no secrets leaked
  5. Push — ship it

If any step fails, the commit doesn’t ship. This is not optional. This is not aspirational. This is how the haus runs.

Because the only thing worse than an AI agent making a mistake is one making a mistake with your API keys, your health data, and your kid’s screen time schedule all in the same system — and nobody finding out until the credit card bill arrives. The discipline isn’t paranoia. It’s the minimum viable responsibility for running this kind of infrastructure in a place where people sleep.