Introduction
memledger is the open-source trust layer for multi-agent AI. It is a Python library — install it via pip, point it at a vector store and an attribute store, and your agents have memory you can attribute, gate on confidence, and audit end-to-end.
OSS by default, no lock-in. Drops onto AWS managed backends (Aurora, OpenSearch, Bedrock) when you need them.
pip install memledger[oss] # Postgres + pgvector + local embeddings (default)
pip install memledger[aws] # Aurora PostgreSQL + Bedrock
Most memory infrastructure optimizes recall quality — how accurately the system retrieves something relevant. memledger adds a complementary layer: trust. Who wrote this memory, how confident were they, what did they derive it from, and how has it been used since.
In a single-agent system the distinction is academic. In a multi-agent system, where agents read and act on each other's beliefs, it is the difference between systems that fail gracefully and systems that quietly hallucinate at scale.
The shape of the trust layer
Every memledger memory carries first-class fields for attribution (created_by, session_id, namespace), confidence (confidence, hedged), and derivation (derived_from, supersedes, workflow_id, triggered_by). These are not sidecar metadata — they are the record.
On top of those fields memledger applies four guarantees you can audit:
- Provenance chains — every memory tracks its derivation. Chains span agents and sessions and form a DAG you can query.
- Weakest-link confidence — effective confidence at retrieval is
min(declared, chain.min_confidence). A high-confidence claim derived from a low-confidence ancestor cannot outscore its weakest link. - Conflict detection — every
add()checks for near-duplicates in the same namespace. Conflicts dispatch to a configurable resolver and emit aCONFLICTSedge in the chain store. - MAI rubric (Memory Attribution Integrity) — the rubric memledger uses to score memory quality, runnable in three tiers: deterministic (Tier 1, no LLM), RAGAS LLM-as-judge (Tier 2, provider-agnostic via LiteLLM), and structural (Tier 3, OTEL-span based). DeepEval, Phoenix Evals, LangSmith, OpenAI Evals, TruLens, and AgentCore Evaluations adapters are on the v2.2+ evaluation roadmap.
Where to start
- Quickstart (AWS) — Aurora + Bedrock, the canonical AWS path.
- Quickstart (OSS) — in-cluster Postgres + local fastembed, no cloud creds.
- Concepts — start with the Memory record, then Provenance chain and Effective confidence.
- Deployment — Open-source for getting started, kagent on EKS for production AWS.
Compatibility
- Python: 3.10 – 3.13
- OSS storage: PostgreSQL ≥ 14 with pgvector ≥ 0.5
- AWS storage: Aurora PostgreSQL · OpenSearch (DynamoDB lands in v2.1)
- Embeddings: fastembed (local) · Bedrock Titan · any LiteLLM provider
- License: Apache 2.0