Engineering Discipline

Engineering Discipline
Section titled “Engineering Discipline”Date: 2026-04-11 Status: Active
Running AI agents in your haus is either professional engineering or professional negligence. There is no middle ground. The same system that controls your kid’s screen time also holds your API keys, your health data, and the credentials to your fund’s deal flow pipeline. If that sentence doesn’t make you want to write tests, nothing will.
This page documents the standards that keep Sanctum from becoming a cautionary tale — the kind forwarded around Hacker News with subject lines like “man lets AI agents manage home, discovers API key on public GitHub three weeks later.” The rule is simple: if it’s not tested, it doesn’t exist. The corollary is the one this page keeps honest — if it’s deleted, it doesn’t get to claim coverage either.
Test Coverage
Section titled “Test Coverage”Every shipping component carries its own suite, and every suite runs independently. No untested code ships. No exceptions. No “we’ll add tests later.” Later is where bugs go to breed.
The components that matter are the ones still on disk. The table below tracks what actually exists and runs today, not the ghosts of training experiments past.
| Component | Suite | What It Verifies |
|---|---|---|
Cloud Proxy (proxyd) | services/sanctum-proxy/test_e2e_proxy.sh | Release build, binary present, :4040 health responds |
| Sanctum Olympics | sanctum-olympics/test_olympics.py | Config loading, task YAML validation, judge parsing, mock e2e |
| Secret Rotation | ~/.sanctum/scripts/secret-rotation.sh --full | 6-phase sweep (see below) |
| Docs Quality | scripts/contrib-check.py | Frontmatter, haus rule, emoji ban, <digit escapes, hero presence |
| Build Release | tests/test_build_release.sh | Prerequisites, binary, metallib, dist packaging, deploy flags |
# Run a suite against the thing it actually tests — locally, no SSHbash ~/Projects/sanctum-rs/services/sanctum-proxy/test_e2e_proxy.sh # proxyd e2epython3 ~/Projects/sanctum-olympics/test_olympics.py # olympicsbash ~/.sanctum/scripts/secret-rotation.sh --full # secret sweeppython3 ~/Documents/Claude_Code/sanctum-docs/scripts/contrib-check.py src/content/docs/architecture/engineering-discipline.mdxEvery suite can be run by a human who just woke up and doesn’t remember what day it is. That is a design requirement, not an accident — and the same discipline is why the rows that lost their code lost their place in the table.
Secret Management
Section titled “Secret Management”Zero hardcoded API keys. Anywhere. Verified across every repo in both GitHub organizations — Ogilthorp3 and Triptyq-Capital, 57 repos and counting. The count keeps climbing; the invariant does not move. This is the line in the sand.
The API key was hardcoded in a Python file pushed to a public GitHub repo. For three weeks. Nobody noticed. The key had full API access. The bill was… educational. That was the last time a secret touched source control, and the infrastructure that prevents it from ever happening again is aggressive by design.
| Check | Status |
|---|---|
| Hardcoded keys in source files | 0 (all repos, both machines) |
| Hardcoded keys in public repos | 0 (Ogilthorp3 + Triptyq-Capital) |
| API keys in git history | Neutralized (old keys revoked) |
| OpenRouter key storage | macOS Keychain (both machines) |
| OpenRouter Management key | macOS Keychain (for programmatic rotation) |
| Key rotation | Zero-touch via tools/secret-rotator/rotate.py |
| Secret scanning | 6-phase sweep via secret-rotation.sh --full |
Key Rotation Architecture
Section titled “Key Rotation Architecture”macOS Keychain ├── openrouter-api-key → Used by proxyd + benchmarks ├── openrouter-mgmt-key → Used by rotate.py (admin-only) └── anthropic-api-key → Used by coding-llm-bench
tools/secret-rotator/rotate.py: 1. Mint new value (via provider Management API) 2. Validate + verify (shape check, then a real test call) 3. Propagate (Keychain + SOPS vault, every write backed up) 4. Revoke old value (via provider Management API) Zero browser. Zero human. Zero downtime.The rotator doesn’t ask for permission. It mints the new value, proves it works, propagates it to the Keychain and the SOPS vault, and kills the old one — and because it validates the shape and aborts before propagation if verify fails, a half-rotated secret never reaches a service. If you’re still rotating API keys by logging into a dashboard and clicking buttons, you’re doing it wrong — and more importantly, you’re doing it rarely, which means your keys are old, which means the blast radius when one leaks is measured in months.
Infrastructure Quality
Section titled “Infrastructure Quality”This is a dated snapshot of a single cleanup sweep — measured 2026-04-11, not a live invariant. Some of it drifts back on purpose: every .bak-pre-X plist is a deliberate rollback point from the safe-edit ritual, so the disabled-file count breathes in and out as services get touched. The zeros below were true the day the broom came out; the discipline is re-running the broom, not pretending the dust never returns.
| Metric (2026-04-11 sweep) | Before | After |
|---|---|---|
| LaunchAgents: disabled/backup files | 23 | 0 |
Shell scripts without set -euo pipefail | 13 | 0 |
| Rust compiler warnings | 4 | 0 |
| Stale adapter directories | 51 GB | 1.9 GB (2 active) |
CSS !important overrides | 36 | 0 |
| Inline CSS styles in MDX | 9 | 0 |
| Legacy image path conventions | 55 | 0 |
instance.yaml stale services | 5+ | 0 (all annotated) |
Every row in that table represents a class of problem that existed, was measured, and was eliminated. The 23 disabled LaunchAgent files were ghosts from previous configurations — not hurting anything, not helping anything, just cluttering the namespace and making it harder to tell what was real. The 13 shell scripts without set -euo pipefail were time bombs waiting for a failed command to silently continue. The 51 GB of stale adapters were eleven training experiments that nobody cleaned up because nobody knew which ones were active.
Container Doctrine
Section titled “Container Doctrine”Most of Sanctum runs natively — Rust binaries, Python services, Node sidecars, all under launchd. When we do reach for a container, it is for software we did not write (Home Assistant, Outline, Postgres, Redis) and which we cannot meaningfully shrink. Those upstream images are 2.67 GB and 1.23 GB respectively, and they stay that way.
For our own containers the default is Chainguard wolfi-base plus a multi-stage build. Build the artifact in a stage with the full toolchain, then copy just the binary and its runtime deps into a wolfi-base final stage. The popular guidance — node:22-slim → node:22-alpine → gcr.io/distroless/nodejs22-debian12 — is the right direction, but it stops one tier short. wolfi-base is built rolling with Wolfi packages, ships with glibc (not musl, no surprise libpython symbol mismatches), and carries a smaller CVE surface than distroless because the apk package graph is curated for image-only use.
| Layer | Default |
|---|---|
| Build stage | cgr.dev/chainguard/wolfi-base:latest + apk add --no-cache <toolchain> |
| Runtime stage | cgr.dev/chainguard/wolfi-base:latest + only the runtime libs |
.dockerignore | Required — never copy node_modules, __pycache__, .venv, tests, dev configs |
| User | A dedicated non-root user, dropped to with USER <name> as the last instruction — never root. Shannon uses adduser -u 1001 -G pentest then USER pentest |
| Healthcheck | One process check; do not curl an internal HTTP endpoint that depends on the network stack being up |
Shannon (~/Projects/shannon/Dockerfile) is the canonical precedent in this workspace — wolfi-base builder and runtime, multi-stage, no apt-cache leftovers. Use it as the reference when adding any new custom container.
Cloud Proxy Hardening
Section titled “Cloud Proxy Hardening”The cloud proxy (proxyd, the Rust daemon at ~/Projects/sanctum-rs/services/sanctum-proxy, on :4040) sits between the council and the upstream providers. It started life as a Python script and is a Rust binary now — the usual Sanctum arc: Python while a thing is feature-organic, Rust once it’s load-bearing. It exists because giving six AI agents unlimited access to a pay-per-token cloud API is how you wake up to a surprise invoice. Five layers of protection:
| Layer | What It Does |
|---|---|
| Key security | keychain.rs reads PROVIDER_API_KEY from the launchd env first, falls back to the macOS login Keychain — never in code or git |
| Fallback chain | Data-driven, from the fallbacks: map in config.yaml — not hardcoded |
| Cost control | Per-agent daily USD caps (budget.rs): $2.00 Windu and Mothma, $0.50 everyone else |
| Audit trail | Every call logged to ~/.sanctum/logs/cloud-proxy-audit.jsonl: agent, model, tokens, cost, latency |
| Rate limiting | Budget exhaustion trips a 503 before the request ever reaches a provider |
The fallback chain is the interesting part, and the interesting part is that it’s data. Each primary model names its understudies in config.yaml — council-brain falls to council-code (the qwen2.5-coder-14b-instruct seat on :1234), which falls to council-27b, and so on down the map. proxy.rs walks that chain at request time; no recompile to reorder it. When every link is exhausted, the proxy returns an honest error instead of a hallucinated response. Graceful degradation, not graceful lying.
The Standard
Section titled “The Standard”Every commit follows the same discipline:
- Code — write it
- Test — prove it works
- Document — explain why it exists
- Scan — verify no secrets leaked
- Push — ship it
If any step fails, the commit doesn’t ship. This is not optional. This is not aspirational. This is how the haus runs.
Because the only thing worse than an AI agent making a mistake is one making a mistake with your API keys, your health data, and your kid’s screen time schedule all in the same system — and nobody finding out until the credit card bill arrives. The discipline isn’t paranoia. It’s the minimum viable responsibility for running this kind of infrastructure in a place where people sleep.