Memory Service

The constellation — every memory a node, every connection a reason to remember

The Memory Service is a Rust binary (port 42069) that gives every Sanctum agent persistent memory without requiring any of them to manage it. It is the primary engine; the older memory-vault-mcp shim has been replaced by something faster, smaller, and built to sit directly behind the proxy so that remembering can happen as a side effect of thinking — which, if you squint, is how it works for the rest of us too.

Dual storage: SQLite with FTS5 for search, markdown files for humans and Obsidian. The database is fast. The files are legible. The agents don’t care which one you read. You will care, at 2 AM, when you need to understand why Yoda thinks the internet goes down every Thursday.

How It Works with the Proxy

The proxy (port 4040) already sees every conversation in the haus. Making it the memory capture point was less a design decision than an observation: the data was already flowing through the wire. We just started writing it down.

The proxy carries the memory hooks (sanctum-proxy/src/memory.rs), but the two services aren’t introduced yet: the proxy’s default memory.url still points at the service’s old code-default port (18097), while the live service answers on 42069. Until that’s reconciled, capture is dark and agents recall only what they wrote directly. The hooks themselves are three, all non-blocking:

Pre-request — The proxy queries sanctum-memory for cached context relevant to the incoming conversation and injects it into the system message. The agent receives memories it didn’t ask for and doesn’t know it received. This is, technically, inception.
Post-response — After streaming the response, the proxy fires an async ingest call with the conversation data. No waiting. No acknowledgment. Fire and forget.
Failure isolation — Memory failures never block or slow requests. If the memory service is down, the proxy sends the request without context and logs a warning. Agents can think without remembering. They just think less well.

Memory Types

Every memory has a type. The type determines where it lives, how long it survives, and how it’s retrieved.

Type	Purpose	Example
`semantic`	Facts, preferences, knowledge	”User prefers terse responses”
`episodic`	Events with timestamps	”Internet outage March 23 at 3:39 AM”
`procedural`	How-to knowledge, runbooks	”To restart LM Studio, kill the process then…”
`observation`	Agent-noted patterns	”Disk usage trending up 2% per week”
`session_summary`	Compressed conversation logs	End-of-session distillation

The distinction between semantic and episodic matters for retrieval. When an agent asks “what does the user prefer,” you search semantic. When it asks “what happened last Thursday,” you search episodic. Conflating them is how you get a memory system that answers “what happened last Thursday” with “the user prefers dark mode.”

Storage Architecture

Dual storage, matching the existing vault layout:

Backend	Role	Format
SQLite (`.vault.db`)	Search, metadata, indexes	FTS5 full-text, JSON1 metadata
Markdown files	Human-readable, git-tracked	YAML frontmatter + body

The markdown directories — inbox/, knowledge/, events/, procedures/ — are unchanged from the vault. Obsidian still works. Git history still works. The database is the index; the files are the truth.

Importance Scoring

Every memory gets a score between 0.0 and 1.0. The score determines how long it lives.

Formula: base × source_weight × recency × access_boost × link_boost

Factor	Calculation	Rationale
Source weight	user=0.9, system=0.85, claude-code=0.7, gemini-cli=0.7, openclaw=0.7, HA=0.5	User-stated facts outrank machine observations
Recency	`hours^(-0.3)` (power-law decay)	Recent memories matter more, but the decay is gentle
Access boost	`1 + ln(access_count + 1)`	Frequently accessed memories earn protection
Link boost	`1 + 0.1 × tag_count`	More tags, more reach — a proxy for connectedness, until backlinks land

TTL Rules

Importance determines lifespan. The system forgets on purpose — and considers this a feature.

Importance	TTL	Notes
> 0.8	Permanent	Core knowledge, user-stated preferences
0.5 – 0.8	90 days	Agent-observed patterns, recurring events
0.3 – 0.5	30 days	Single observations, transient context
< 0.3	7 days	Ephemeral session data

Protection rules: Memories with importance above 0.8 or an access count of 5 or more are exempt from expiry. If the system keeps reaching for a memory, the memory stays. Even if the math says otherwise.

Consolidation

Runs every 6 hours. The process is hybrid: the cheap work happens immediately, LLM enrichment is deferred to council-27b on a best-effort basis. If the local model is busy or down, consolidation finishes without enrichment and tries again next cycle — the report just shows llm_enriched: 0 and nobody panics.

Scan inbox — Find raw notes older than 24 hours
Recompute scores — Update importance for all active memories
LLM enrichment — Extract entities, tags, and relationships via council-27b (best-effort)
Promote — Move consolidated notes to knowledge/, events/, or procedures/
Expire — Apply TTL rules, archive expired notes; archived notes are deleted after 90 days
Enforce caps — Inbox: 300, Knowledge: 1000, Events: 500, Procedures: 200

API Reference

All endpoints return JSON. The service binds to 127.0.0.1:42069. Retrieval is GET-with-query-params, not POST — the things you read are reads, and the router believes in HTTP verbs.

Method	Endpoint	Description
GET	`/v1/recall?agent=&limit=`	Context-aware retrieval — memories ranked by relevance for an agent
GET	`/v1/search?q=&agent=&limit=`	FTS5 full-text search, optionally scoped to one agent
POST	`/v1/ingest`	Async ingestion of conversation data (called by the proxy)
POST	`/v1/write`	Create or update a note
GET	`/v1/read?path=`	Read a note by markdown path (auto-tracks access count)
DELETE	`/v1/note?path=`	Remove a note by markdown path
GET	`/health`	Liveness probe — `{status, service, version}`
POST	`/v1/consolidate`	Trigger a consolidation pass; returns the per-stage report

Configuration

All settings live in instance.yaml under services.memory_vault. The live block sets only enabled and port; every other field below is a struct default the service fills in if you leave it out:

services:
  memory_vault:
    enabled: true
    port: 42069
    vault_dir: "~/.sanctum/memory"                    # db lives at {vault_dir}/.vault.db
    consolidation_interval_hours: 6                    # hours, not seconds
    llm_url: "http://127.0.0.1:1234/v1/chat/completions"  # LM Studio-style endpoint
    llm_model: "council-27b"                           # enrichment model, best-effort

Technical Specifications

Property	Value
Host	`127.0.0.1`
Port	42069
Binary	~6.2MB (Rust; SQLite compiled in, links only system libs)
Storage	SQLite 3 + FTS5, markdown files
Model tier	`council-27b` (enrichment only, best-effort)
Dependencies	None at runtime (SQLite bundled via rusqlite)
LaunchAgent	`com.sanctum.memory-vault`