Engineering Discipline

Engineering Discipline
Section titled “Engineering Discipline”Date: 2026-04-11 Status: Active
Running AI agents in your haus is either professional engineering or professional negligence. There is no middle ground. The same system that controls your kid’s screen time also holds your API keys, your health data, and the credentials to your fund’s deal flow pipeline. If that sentence doesn’t make you want to write tests, nothing will.
This page documents the standards that keep Sanctum from becoming a cautionary tale — the kind that gets forwarded around Hacker News with subject lines like “man lets AI agents manage home, discovers API key on public GitHub three weeks later.” Every number below is verified by automated tests. If it’s not tested, it doesn’t exist.
Test Coverage
Section titled “Test Coverage”166 tests across 10 components. No untested code ships. No exceptions. No “we’ll add tests later.” Later is where bugs go to breed.
| Component | Tests | What They Verify |
|---|---|---|
| Smart Router (Rust) | 23 | Pattern matching, intent classification, mock backend dispatch |
| LoRA in Rust | 27 | Config parsing, quantize roundtrip, 248-pair merge, weight hash |
| Sanctum Olympics | 31 | Config loading, task YAML validation, judge parsing, mock e2e |
| Gemma 4 Pipeline | 21 | Data conversion, enrichment, training config, 2-iter run, memory |
| Cloud Proxy | 16 | Health, models, daily caps, fallback chain, agent extraction, audit |
| Carmack Eval v2 | 19 | Task structure, scoring rules, partial scores, HTTP error handling |
| Key Rotation | 5 | Dry-run, verify, keychain access, mask function |
| Secret Rotation | 10 | Scan, keychain audit, token patterns, usage display |
| Docs Quality | 8 | Frontmatter, CSS hygiene, inline styles, image paths, hero uniqueness |
| Build Release | 6 | Prerequisites, binary, metallib, dist packaging, deploy flags |
| Total | 166 |
# Run everythingcd sanctum-rs && cargo test -- --test-threads=1 # 60 Rust testscd sanctum-olympics && python test_olympics.py # 31 Python testscd mlx-finetune && python tests/test_gemma4_pipeline.py # 21 pipeline testscd mlx-finetune && python tests/test_carmack_v2.py # 19 eval testspython tests/test_cloud_proxy.py # 16 proxy testsbash tests/test_sanctum_docs.sh # 8 docs testsbash ~/.sanctum/tests/test_rotate_openrouter.sh # 5 rotation testsbash ~/.sanctum/tests/test_secret_rotation.sh # 10 secret testsbash tests/test_build_release.sh # 6 build testsEvery test suite runs independently. Every test suite can be run by a human who just woke up and doesn’t remember what day it is. This is a design requirement, not an accident.
Secret Management
Section titled “Secret Management”Zero hardcoded API keys. Anywhere. Verified across 31 repos in 2 GitHub organizations. This is the line in the sand that does not move.
The API key was hardcoded in a Python file pushed to a public GitHub repo. For three weeks. Nobody noticed. The key had full API access. The bill was… educational. That was the last time a secret touched source control, and the infrastructure that prevents it from ever happening again is aggressive by design.
| Check | Status |
|---|---|
| Hardcoded keys in source files | 0 (all repos, both machines) |
| Hardcoded keys in public repos | 0 (Ogilthorp3 + Triptyq-Capital) |
| API keys in git history | Neutralized (old keys revoked) |
| OpenRouter key storage | macOS Keychain (both machines) |
| OpenRouter Management key | macOS Keychain (for programmatic rotation) |
| Key rotation | Zero-touch via rotate-openrouter.sh |
| Secret scanning | 6-phase sweep via secret-rotation.sh |
Key Rotation Architecture
Section titled “Key Rotation Architecture”macOS Keychain ├── openrouter-api-key → Used by cloud proxy + benchmarks ├── openrouter-mgmt-key → Used by rotation script (admin-only) └── anthropic-api-key → Used by coding-llm-bench
rotate-openrouter.sh: 1. Create new key (via Management API) 2. Verify new key works (test completion) 3. Store in Keychain (both machines) 4. Delete old key (via Management API) Zero browser. Zero human. Zero downtime.The rotation script doesn’t ask for permission. It creates the new key, proves it works, stores it, and kills the old one. The entire cycle takes under ten seconds. If you’re still rotating API keys by logging into a dashboard and clicking buttons, you’re doing it wrong — and more importantly, you’re doing it rarely, which means your keys are old, which means the blast radius when one leaks is measured in months.
Infrastructure Quality
Section titled “Infrastructure Quality”| Metric | Before | After |
|---|---|---|
| LaunchAgents: disabled/backup files | 23 | 0 |
Shell scripts without set -euo pipefail | 13 | 0 |
| Rust compiler warnings | 4 | 0 |
| Stale adapter directories | 51 GB | 1.9 GB (2 active) |
CSS !important overrides | 36 | 0 |
| Inline CSS styles in MDX | 9 | 0 |
| Legacy image path conventions | 55 | 0 |
instance.yaml stale services | 5+ | 0 (all annotated) |
Every row in that table represents a class of problem that existed, was measured, and was eliminated. The 23 disabled LaunchAgent files were ghosts from previous configurations — not hurting anything, not helping anything, just cluttering the namespace and making it harder to tell what was real. The 13 shell scripts without set -euo pipefail were time bombs waiting for a failed command to silently continue. The 51 GB of stale adapters were eleven training experiments that nobody cleaned up because nobody knew which ones were active.
Cloud Proxy Hardening
Section titled “Cloud Proxy Hardening”The cloud proxy (sanctum-cloud-proxy.py) sits between the Smart Router and OpenRouter. It exists because giving six AI agents unlimited access to a pay-per-token cloud API is how you wake up to a surprise invoice. Five layers of protection:
| Layer | What It Does |
|---|---|
| Key security | API key from macOS Keychain only — never in code, env, or git |
| Fallback chain | Cloud → Coder-14B → Council MLX → error |
| Cost control | Per-agent daily caps ($2/day Windu, $0.50 default) |
| Audit trail | Every cloud call logged: agent, tokens, cost, latency |
| Rate limiting | Circuit breaker on spending caps |
The fallback chain is the interesting part. When Opus is unavailable or the daily cap is hit, the request falls to Coder-14B. When Coder-14B is down, it falls to Council MLX. When everything is down, the system returns an honest error instead of a hallucinated response. Graceful degradation, not graceful lying.
The Standard
Section titled “The Standard”Every commit follows the same discipline:
- Code — write it
- Test — prove it works
- Document — explain why it exists
- Scan — verify no secrets leaked
- Push — ship it
If any step fails, the commit doesn’t ship. This is not optional. This is not aspirational. This is how the haus runs.
Because the only thing worse than an AI agent making a mistake is an AI agent making a mistake with your API keys, your health data, and your kid’s screen time schedule all in the same system — and nobody finding out until the credit card bill arrives. The discipline isn’t paranoia. It’s the minimum viable responsibility for running this kind of infrastructure in a place where people sleep.