Skip to main content

Introduction

memledger is the open-source trust layer for multi-agent AI. It is a Python library — install it via pip, point it at a vector store and an attribute store, and your agents have memory you can attribute, gate on confidence, and audit end-to-end.

OSS by default, no lock-in. Drops onto AWS managed backends (Aurora, OpenSearch, Bedrock) when you need them.

pip install memledger[oss] # Postgres + pgvector + local embeddings (default)
pip install memledger[aws] # Aurora PostgreSQL + Bedrock

Most memory infrastructure optimizes recall quality — how accurately the system retrieves something relevant. memledger adds a complementary layer: trust. Who wrote this memory, how confident were they, what did they derive it from, and how has it been used since.

In a single-agent system the distinction is academic. In a multi-agent system, where agents read and act on each other's beliefs, it is the difference between systems that fail gracefully and systems that quietly hallucinate at scale.

The shape of the trust layer

Every memledger memory carries first-class fields for attribution (created_by, session_id, namespace), confidence (confidence, hedged), and derivation (derived_from, supersedes, workflow_id, triggered_by). These are not sidecar metadata — they are the record.

On top of those fields memledger applies four guarantees you can audit:

  1. Provenance chains — every memory tracks its derivation. Chains span agents and sessions and form a DAG you can query.
  2. Weakest-link confidence — effective confidence at retrieval is min(declared, chain.min_confidence). A high-confidence claim derived from a low-confidence ancestor cannot outscore its weakest link.
  3. Conflict detection — every add() checks for near-duplicates in the same namespace. Conflicts dispatch to a configurable resolver and emit a CONFLICTS edge in the chain store.
  4. MAI rubric (Memory Attribution Integrity) — the rubric memledger uses to score memory quality, runnable in three tiers: deterministic (Tier 1, no LLM), RAGAS LLM-as-judge (Tier 2, provider-agnostic via LiteLLM), and structural (Tier 3, OTEL-span based). DeepEval, Phoenix Evals, LangSmith, OpenAI Evals, TruLens, and AgentCore Evaluations adapters are on the v2.2+ evaluation roadmap.

Where to start

Compatibility

  • Python: 3.10 – 3.13
  • OSS storage: PostgreSQL ≥ 14 with pgvector ≥ 0.5
  • AWS storage: Aurora PostgreSQL · OpenSearch (DynamoDB lands in v2.1)
  • Embeddings: fastembed (local) · Bedrock Titan · any LiteLLM provider
  • License: Apache 2.0