Memory Service

The Memory Service is a Rust binary (port 42069) that gives every Sanctum agent persistent memory without requiring any of them to manage it. It is the primary engine; the older memory-vault-mcp shim has been replaced by something faster, smaller, and built to sit directly behind the proxy so that remembering can happen as a side effect of thinking — which, if you squint, is how it works for the rest of us too.
Dual storage: SQLite with FTS5 for search, markdown files for humans and Obsidian. The database is fast. The files are legible. The agents don’t care which one you read. You will care, at 2 AM, when you need to understand why Yoda thinks the internet goes down every Thursday.
How It Works with the Proxy
Section titled “How It Works with the Proxy”The proxy (port 4040) already sees every conversation in the haus. Making it the memory capture point was less a design decision than an observation: the data was already flowing through the wire. We just started writing it down.
The proxy carries the memory hooks (sanctum-proxy/src/memory.rs), but the two services aren’t introduced yet: the proxy’s default memory.url still points at the service’s old code-default port (18097), while the live service answers on 42069. Until that’s reconciled, capture is dark and agents recall only what they wrote directly. The hooks themselves are three, all non-blocking:
- Pre-request — The proxy queries sanctum-memory for cached context relevant to the incoming conversation and injects it into the system message. The agent receives memories it didn’t ask for and doesn’t know it received. This is, technically, inception.
- Post-response — After streaming the response, the proxy fires an async ingest call with the conversation data. No waiting. No acknowledgment. Fire and forget.
- Failure isolation — Memory failures never block or slow requests. If the memory service is down, the proxy sends the request without context and logs a warning. Agents can think without remembering. They just think less well.
Memory Types
Section titled “Memory Types”Every memory has a type. The type determines where it lives, how long it survives, and how it’s retrieved.
| Type | Purpose | Example |
|---|---|---|
semantic | Facts, preferences, knowledge | ”User prefers terse responses” |
episodic | Events with timestamps | ”Internet outage March 23 at 3:39 AM” |
procedural | How-to knowledge, runbooks | ”To restart LM Studio, kill the process then…” |
observation | Agent-noted patterns | ”Disk usage trending up 2% per week” |
session_summary | Compressed conversation logs | End-of-session distillation |
The distinction between semantic and episodic matters for retrieval. When an agent asks “what does the user prefer,” you search semantic. When it asks “what happened last Thursday,” you search episodic. Conflating them is how you get a memory system that answers “what happened last Thursday” with “the user prefers dark mode.”
Storage Architecture
Section titled “Storage Architecture”Dual storage, matching the existing vault layout:
| Backend | Role | Format |
|---|---|---|
SQLite (.vault.db) | Search, metadata, indexes | FTS5 full-text, JSON1 metadata |
| Markdown files | Human-readable, git-tracked | YAML frontmatter + body |
The markdown directories — inbox/, knowledge/, events/, procedures/ — are unchanged from the vault. Obsidian still works. Git history still works. The database is the index; the files are the truth.
Importance Scoring
Section titled “Importance Scoring”Every memory gets a score between 0.0 and 1.0. The score determines how long it lives.
Formula: base × source_weight × recency × access_boost × link_boost
| Factor | Calculation | Rationale |
|---|---|---|
| Source weight | user=0.9, system=0.85, claude-code=0.7, gemini-cli=0.7, openclaw=0.7, HA=0.5 | User-stated facts outrank machine observations |
| Recency | hours^(-0.3) (power-law decay) | Recent memories matter more, but the decay is gentle |
| Access boost | 1 + ln(access_count + 1) | Frequently accessed memories earn protection |
| Link boost | 1 + 0.1 × tag_count | More tags, more reach — a proxy for connectedness, until backlinks land |
TTL Rules
Section titled “TTL Rules”Importance determines lifespan. The system forgets on purpose — and considers this a feature.
| Importance | TTL | Notes |
|---|---|---|
| > 0.8 | Permanent | Core knowledge, user-stated preferences |
| 0.5 – 0.8 | 90 days | Agent-observed patterns, recurring events |
| 0.3 – 0.5 | 30 days | Single observations, transient context |
| < 0.3 | 7 days | Ephemeral session data |
Protection rules: Memories with importance above 0.8 or an access count of 5 or more are exempt from expiry. If the system keeps reaching for a memory, the memory stays. Even if the math says otherwise.
Consolidation
Section titled “Consolidation”Runs every 6 hours. The process is hybrid: the cheap work happens immediately, LLM enrichment is deferred to council-27b on a best-effort basis. If the local model is busy or down, consolidation finishes without enrichment and tries again next cycle — the report just shows llm_enriched: 0 and nobody panics.
- Scan inbox — Find raw notes older than 24 hours
- Recompute scores — Update importance for all active memories
- LLM enrichment — Extract entities, tags, and relationships via
council-27b(best-effort) - Promote — Move consolidated notes to
knowledge/,events/, orprocedures/ - Expire — Apply TTL rules, archive expired notes; archived notes are deleted after 90 days
- Enforce caps — Inbox: 300, Knowledge: 1000, Events: 500, Procedures: 200
API Reference
Section titled “API Reference”All endpoints return JSON. The service binds to 127.0.0.1:42069. Retrieval is GET-with-query-params, not POST — the things you read are reads, and the router believes in HTTP verbs.
| Method | Endpoint | Description |
|---|---|---|
| GET | /v1/recall?agent=&limit= | Context-aware retrieval — memories ranked by relevance for an agent |
| GET | /v1/search?q=&agent=&limit= | FTS5 full-text search, optionally scoped to one agent |
| POST | /v1/ingest | Async ingestion of conversation data (called by the proxy) |
| POST | /v1/write | Create or update a note |
| GET | /v1/read?path= | Read a note by markdown path (auto-tracks access count) |
| DELETE | /v1/note?path= | Remove a note by markdown path |
| GET | /health | Liveness probe — {status, service, version} |
| POST | /v1/consolidate | Trigger a consolidation pass; returns the per-stage report |
Configuration
Section titled “Configuration”All settings live in instance.yaml under services.memory_vault. The live block sets only enabled and port; every other field below is a struct default the service fills in if you leave it out:
services: memory_vault: enabled: true port: 42069 vault_dir: "~/.sanctum/memory" # db lives at {vault_dir}/.vault.db consolidation_interval_hours: 6 # hours, not seconds llm_url: "http://127.0.0.1:1234/v1/chat/completions" # LM Studio-style endpoint llm_model: "council-27b" # enrichment model, best-effortTechnical Specifications
Section titled “Technical Specifications”| Property | Value |
|---|---|
| Host | 127.0.0.1 |
| Port | 42069 |
| Binary | ~6.2MB (Rust; SQLite compiled in, links only system libs) |
| Storage | SQLite 3 + FTS5, markdown files |
| Model tier | council-27b (enrichment only, best-effort) |
| Dependencies | None at runtime (SQLite bundled via rusqlite) |
| LaunchAgent | com.sanctum.memory-vault |