Confidence flags (PASS / FLAG / FILTER)

memledger applies a three-tier policy to every search result, gating on effective confidence (not declared confidence).

The three tiers

Flag	Effective confidence	Effect
🟢 PASS	≥ `flag_threshold` (default 0.6)	Returned normally
🟡 FLAG	between `min_threshold` and `flag_threshold` (default 0.4–0.6)	Returned with `confidence_flag="FLAG"`; agent code can decide
🔴 FILTER	< `min_threshold` (default 0.4)	Excluded from results entirely

Configuration

result = await ml.search(
    query="connection pool fix",
    namespace="/ops/incidents/payment-svc",
    confidence_policy={
        "min_threshold": 0.4,
        "flag_threshold": 0.6,
    },
)

print(result.metadata["confidence_gating"])
# {'passed': 4, 'flagged': 1, 'filtered': 2,
#  'policy': {'min': 0.4, 'flag': 0.6}, ...}

Set thresholds at the call site, on the instance, or globally in memledger.yaml. Per-call wins.

Per-result inspection

Every record returned by search() carries a confidence_flag attribute (PASS or FLAG — filtered records aren't returned). Filtered records are not accessible from the search result today; the gate counts and per-record decisions are available on result.metadata["confidence_gating"].

What gets observed

The confidence_gating block on result.metadata exposes counts plus a per-record breakdown — see Phoenix tracing for charting it over time.

Why three tiers and not two

Two tiers would force a binary choice — block or allow. In practice the FLAG tier is where the most interesting agent behavior lives: "we have a guess, but you should verify before acting." Removing it collapses signal you want.

Three tiers also map cleanly onto how downstream agents react:

PASS → use directly in reasoning
FLAG → use, but mark the derived claim as hedged so its effective_confidence propagates the uncertainty
FILTER → never seen; cannot contaminate the chain

That separation is what makes weakest-link confidence work end-to-end: a flagged retrieval can still be useful, but its uncertainty is preserved as the chain extends.

The three tiers​

Configuration​

Per-result inspection​

What gets observed​

Why three tiers and not two​

The three tiers

Configuration

Per-result inspection

What gets observed

Why three tiers and not two