Configuration Guide — Memory Vault AI
All configuration is driven by environment variables and the MemoryConfig Pydantic model.
Environment variables always override MemoryConfig defaults.
A .env file in the project root is loaded automatically in development.
Quick Setup
cp .env.example .env
# Edit .env with your settings
Minimal .env (Development)
# Uses ChromaDB + SQLite locally, no auth, no external services
ML_CHROMA_PATH=./data/chroma
ML_SQLITE_PATH=./data/memory.db
ML_EMBEDDING_MODEL=all-MiniLM-L6-v2
Production .env
# Storage
ML_STORAGE_BACKEND=qdrant
ML_QDRANT_URL=http://qdrant:6333
ML_QDRANT_API_KEY=your-qdrant-cloud-key
ML_SQLITE_PATH=/var/lib/memory-vault/memory.db
# Security
ML_API_KEY=your-random-64-char-secret
ML_CORS_ORIGINS=https://yourapp.com,https://app2.com
# Embedding
ML_EMBEDDING_MODEL=all-MiniLM-L6-v2
ML_EMBEDDING_DEVICE=cpu # or: cuda, mps
# Memory behavior
ML_DEFAULT_TOKEN_BUDGET=2000
ML_DEFAULT_TOP_K=5
ML_COMPRESSION_THRESHOLD=10
ML_IMPORTANCE_THRESHOLD=0.3
ML_RERANKER_ENABLED=false
ML_RERANKER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
# Compression LLM (used for summarizing old episodic memories)
ML_COMPRESSION_MODEL=claude-haiku-4-5-20251001
ML_COMPRESSION_API_KEY=your-anthropic-key
# Server
ML_WORKERS=4
ML_PORT=8000
ML_LOG_LEVEL=INFO
ML_LOG_FORMAT=json
ML_LOG_SANITIZE=true # Prevent memory content appearing in logs
All Configuration Options
Storage
| Variable |
Type |
Default |
Description |
ML_STORAGE_BACKEND |
str |
chroma |
Vector store: chroma or qdrant |
ML_CHROMA_PATH |
str |
./data/chroma |
ChromaDB on-disk path |
ML_SQLITE_PATH |
str |
./data/memory.db |
SQLite file path |
ML_QDRANT_URL |
str |
— |
Qdrant server URL (if using Qdrant) |
ML_QDRANT_API_KEY |
str |
— |
Qdrant Cloud API key (optional for local) |
ML_QDRANT_COLLECTION |
str |
memory_vault |
Qdrant collection name |
ML_METADATA_BACKEND |
str |
sqlite |
Metadata store: sqlite or postgres |
ML_POSTGRES_URL |
str |
— |
PostgreSQL URL for multi-instance deployments |
Embedding
| Variable |
Type |
Default |
Description |
ML_EMBEDDING_MODEL |
str |
all-MiniLM-L6-v2 |
Sentence-transformers model name |
ML_EMBEDDING_DEVICE |
str |
cpu |
cpu, cuda, or mps |
ML_EMBEDDING_BATCH_SIZE |
int |
32 |
Chunks per embedding batch |
ML_EMBEDDING_CACHE |
bool |
true |
Cache embeddings by content hash |
Available embedding models (trade-off: quality vs. speed):
| Model |
Dimensions |
Speed |
Quality |
Best for |
all-MiniLM-L6-v2 |
384 |
Fast |
Good |
Default — development + production |
all-mpnet-base-v2 |
768 |
Medium |
Better |
When retrieval quality matters most |
multi-qa-MiniLM-L6-cos-v1 |
384 |
Fast |
Good |
Query-optimized retrieval |
paraphrase-multilingual-MiniLM-L12-v2 |
384 |
Medium |
Good |
Multilingual content |
Memory Behavior
| Variable |
Type |
Default |
Description |
ML_DEFAULT_TOKEN_BUDGET |
int |
2000 |
Default max tokens per recall |
ML_DEFAULT_TOP_K |
int |
5 |
Default memories returned per recall |
ML_IMPORTANCE_THRESHOLD |
float |
0.3 |
Min importance score to save a chunk |
ML_COMPRESSION_THRESHOLD |
int |
10 |
Sessions before auto-compression triggers |
ML_RERANKER_ENABLED |
bool |
false |
Enable cross-encoder re-ranking (adds latency) |
ML_RERANKER_MODEL |
str |
cross-encoder/ms-marco-MiniLM-L-6-v2 |
Cross-encoder model used when reranker is enabled |
ML_MAX_CHUNK_TOKENS |
int |
300 |
Max tokens per chunk during ingestion |
ML_MIN_CHUNK_TOKENS |
int |
50 |
Min tokens per chunk (merge if smaller) |
Compression
| Variable |
Type |
Default |
Description |
ML_COMPRESSION_MODEL |
str |
— |
LLM for summarization (e.g. claude-haiku-4-5-20251001) |
ML_COMPRESSION_API_KEY |
str |
— |
API key for compression LLM |
ML_COMPRESSION_API_BASE |
str |
— |
Custom base URL (for OpenAI-compatible APIs) |
ML_COMPRESSION_SESSIONS |
int |
5 |
Sessions to compress per job run |
API Server
| Variable |
Type |
Default |
Description |
ML_API_KEY |
str |
— |
Enables Bearer auth. Unset = no auth (dev only). |
ML_PORT |
int |
8000 |
API server port |
ML_HOST |
str |
0.0.0.0 |
Bind address |
ML_WORKERS |
int |
1 |
Uvicorn worker count |
ML_CORS_ORIGINS |
str |
* |
Comma-separated allowed origins |
ML_RATE_LIMIT_SAVE |
int |
100 |
Max save requests/min per user |
ML_RATE_LIMIT_RECALL |
int |
200 |
Max recall requests/min per user |
Current enforcement scope:
- ML_RATE_LIMIT_SAVE applies to POST /v1/memory
- ML_RATE_LIMIT_RECALL applies to GET /v1/memory/recall
Observability
| Variable |
Type |
Default |
Description |
ML_LOG_LEVEL |
str |
INFO |
DEBUG, INFO, WARNING, ERROR |
ML_LOG_FORMAT |
str |
text |
text or json |
ML_LOG_SANITIZE |
bool |
false |
Redact memory content from logs |
ML_METRICS_ENABLED |
bool |
false |
Enable Prometheus /metrics endpoint |
Metrics endpoint behavior:
- When ML_METRICS_ENABLED=true, GET /metrics returns Prometheus text exposition.
- When ML_METRICS_ENABLED=false, GET /metrics returns 404.
Using MemoryConfig in Code
from memory_vault import MemoryLayer, MemoryConfig
config = MemoryConfig(
token_budget=3000,
top_k=8,
embedding_model="all-mpnet-base-v2",
storage_backend="qdrant",
qdrant_url="http://localhost:6333",
compression_threshold=5,
reranker_enabled=True,
reranker_model="cross-encoder/ms-marco-MiniLM-L-6-v2",
)
memory = MemoryLayer(user_id="alice", config=config)
Environment variables take precedence over MemoryConfig values.
MemoryConfig defaults take precedence over built-in defaults.
.env.example
Copy this file to .env and fill in your values:
# ── Storage ──────────────────────────────────────
ML_STORAGE_BACKEND=chroma
ML_CHROMA_PATH=./data/chroma
ML_SQLITE_PATH=./data/memory.db
# ML_QDRANT_URL=http://localhost:6333
# ML_QDRANT_API_KEY=
# ── Embedding ────────────────────────────────────
ML_EMBEDDING_MODEL=all-MiniLM-L6-v2
ML_EMBEDDING_DEVICE=cpu
# ── Memory behavior ──────────────────────────────
ML_DEFAULT_TOKEN_BUDGET=2000
ML_DEFAULT_TOP_K=5
ML_IMPORTANCE_THRESHOLD=0.3
ML_COMPRESSION_THRESHOLD=10
ML_RERANKER_ENABLED=false
# ML_RERANKER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
# ── Compression LLM (optional) ───────────────────
# ML_COMPRESSION_MODEL=claude-haiku-4-5-20251001
# ML_COMPRESSION_API_KEY=
# ── API Server ───────────────────────────────────
# ML_API_KEY= # Leave unset for local dev
ML_PORT=8000
ML_WORKERS=1
ML_CORS_ORIGINS=*
# ── Logging ──────────────────────────────────────
ML_LOG_LEVEL=INFO
ML_LOG_FORMAT=text
ML_LOG_SANITIZE=false