Database Schema — Memory Vault AI¶
Two backends work together: ChromaDB for vector search, SQLite for structured metadata. Never query them independently from feature code — always use
StorageLayer.
SQLite Schema (via SQLModel)¶
Table: memory_chunks¶
Primary metadata store for all memory records.
CREATE TABLE memory_chunks (
id TEXT PRIMARY KEY, -- "mem_" + uuid4 hex
user_id TEXT NOT NULL,
session_id TEXT NOT NULL,
memory_type TEXT NOT NULL, -- episodic|semantic|working|procedural
content TEXT NOT NULL,
importance REAL NOT NULL, -- 0.0 – 1.0
token_count INTEGER NOT NULL,
compressed BOOLEAN DEFAULT FALSE,
compression_source BOOLEAN DEFAULT FALSE,
source_session_id TEXT, -- set when this is a compression summary
chroma_id TEXT, -- corresponding ChromaDB document id (NULL for procedural)
created_at DATETIME NOT NULL,
updated_at DATETIME NOT NULL,
metadata TEXT DEFAULT '{}' -- JSON blob for extensibility
);
CREATE INDEX idx_memory_user ON memory_chunks(user_id);
CREATE INDEX idx_memory_user_type ON memory_chunks(user_id, memory_type);
CREATE INDEX idx_memory_session ON memory_chunks(session_id);
CREATE INDEX idx_memory_compressed ON memory_chunks(user_id, compressed);
Table: procedural_memory¶
Key-value store for user preferences and habits.
CREATE TABLE procedural_memory (
id TEXT PRIMARY KEY,
user_id TEXT NOT NULL,
key TEXT NOT NULL, -- e.g. "communication_style", "preferred_language"
value TEXT NOT NULL, -- JSON value
confidence REAL DEFAULT 1.0,
updated_at DATETIME NOT NULL,
source_chunk_id TEXT -- which memory_chunk triggered this
);
CREATE UNIQUE INDEX idx_proc_user_key ON procedural_memory(user_id, key);
Table: sessions¶
CREATE TABLE sessions (
id TEXT PRIMARY KEY,
user_id TEXT NOT NULL,
started_at DATETIME NOT NULL,
last_activity DATETIME NOT NULL,
ended_at DATETIME,
compressed BOOLEAN DEFAULT FALSE,
memory_count INTEGER DEFAULT 0,
total_tokens INTEGER DEFAULT 0
);
CREATE INDEX idx_sessions_user ON sessions(user_id);
Table: compression_jobs¶
CREATE TABLE compression_jobs (
id TEXT PRIMARY KEY,
user_id TEXT NOT NULL,
status TEXT NOT NULL, -- queued|running|completed|failed
sessions_compressed INTEGER DEFAULT 0,
memories_created INTEGER DEFAULT 0,
error TEXT,
created_at DATETIME NOT NULL,
completed_at DATETIME
);
ChromaDB Collections¶
Collection: memory_{user_id} (per-user, or global with user_id filter)¶
Decision: One collection per user vs. one global collection with metadata filtering. We use one global collection with
user_idin metadata for simpler ops. Seedocs/adr/ADR-006-chroma-collection-strategy.md.
Document structure:
collection.add(
ids=["mem_abc123"],
embeddings=[[0.1, 0.2, ...]], # 384-dim float32
documents=["raw text content"],
metadatas=[{
"user_id": "user_123",
"session_id": "sess_xyz",
"memory_type": "episodic",
"importance": 0.72,
"compressed": False,
"created_at": "2024-01-20T10:30:00Z",
}]
)
Query:
results = collection.query(
query_embeddings=[query_vec],
n_results=20,
where={
"user_id": "user_123",
"compressed": False,
}
)
Note: Procedural memory is NOT stored in ChromaDB — only in SQLite. It's retrieved directly by key, not by similarity search.
ID Conventions¶
| Entity | Format | Example |
|---|---|---|
| Memory chunk | mem_ + 12 hex chars |
mem_a3f9b2c10d4e |
| Session | sess_ + 8 hex chars |
sess_9a3b1c2d |
| User | Passed in by caller | user_alice or UUID |
| Compression job | job_ + 8 hex chars |
job_f1e2d3c4 |
Migration Strategy¶
Schema migrations use Alembic. Migration files are in alembic/versions/.
# Generate migration
alembic revision --autogenerate -m "add compression_jobs table"
# Apply migrations
alembic upgrade head
# Rollback one step
alembic downgrade -1
Never edit SQLite schema manually. All changes must go through Alembic.