● Shipped P0 Size M Foundation

Knowledge-Audit — Cross-Source Contradiction Detector

A 3-layer audit + 4-tier cron that catches stale facts and cross-source contradictions in a personal RAG knowledge base — caught 25+ drifted facts in 79 files on day-one deploy.

A cross-source audit layer that sits on top of Personal-RAG and continuously verifies that the corpus does not silently contradict itself. Built after the AI confidently asserted “we run on Oracle Cloud A1 VM” — infrastructure that was never registered.

At a glance

  • 3-layer audit: (1) intra-file consistency, (2) cross-file contradiction within a workspace, (3) cross-workspace contradiction
  • 4-tier cron: daily (light, 4B + Haiku 4.5 verifier) · weekly (8B + Haiku 4.5) · monthly (32B alone, deeper) · event-triggered (post-commit hook on the KB-s3 mount)
  • Runs locally on MacBook Pro M2 Max via launchd — not in the cloud
  • Caught 25+ stale facts in a 79-file corpus on day-one deploy
  • Production swap 2026-05-23: Haiku verifier → Grok 4.3. Real accuracy 80% (strict-match scorer) → 99% (LLM-judged real) after an Eval-Framework bake-off
  • Cost: ~$5/month at current cadence (daily + weekly + monthly + event-driven)
  • Output: Telegram ADHD-friendly digest — severity 🟢🟡🔴 + action verb first + time-boxed + bundled 2× per day
  • Auto-fix policy (weekly/monthly): Grok propose → safety heuristics filter → Grok 4.3 judge verify → git snapshot → apply (review-and-merge gate retained)

Sources audited

  • Memory files (~/.claude/projects/.../memory/*.md)
  • Workspace CLAUDE.md (4 files: global + 3 per-workspace)
  • Project NOTES / README / PRD (~50 files across side projects)
  • Email / Slack / meeting transcripts (LL work)
  • Confluence dump
  • KB notes (curated research)

Stack

Python 3.11 · launchd (4 calendar-interval agents + 1 file-watcher) · Postgres 16 + pgvector (re-uses Personal-RAG retrieval) · bge-m3 embedder · Grok 4.3 (xAI) production judge · Anthropic Haiku 4.5 cheap-tier verifier · Telegram Bot API digest delivery · git snapshot before auto-fix apply

Documentation

DocRead this for
PRDProblem, scope, success metrics, milestones, build vs buy
Architecture3-layer + 4-tier diagrams, data flows, auto-fix pipeline
ImplementationCode structure, prompts, judge prompt, perf numbers, reproducibility
NotesDecision log + production swap to Grok 4.3 + gotchas
Enterprise5 enterprise adaptations (B2B SaaS, fintech, edtech, healthcare, CX)

Why this matters

Persistent AI memory is becoming the default — Claude Projects, ChatGPT Custom GPTs, Cursor .cursorrules, Continue, every IDE. Software engineering solved drift with linters + CI + observability. AI memory has none of that. Without an audit layer, you act on stale facts for months before the failure surfaces.

Drift mechanism (the propagation chain)

memory file (stale)

project NOTES quotes memory

AI reads NOTES + memory, treats as ground truth

AI confidently asserts in chat

human acts on the assertion

production code references infrastructure that does not exist

Catching drift at the top of the chain (memory file) costs $0.001 per audit. Catching it at the bottom (production debug) costs hours.

Foundation pattern

Every persistent AI memory needs an audit layer. Knowledge-Audit is the reference implementation for Personal-RAG; the same pattern is what enterprises will need as B2B AI products start shipping persistent memory.

📚

STACK

  • Python 3.11
  • launchd
  • Postgres 16
  • bge-m3 retrieval
  • Grok 4.3 judge
  • Anthropic Haiku 4.5 verifier
  • Telegram bot
  • git snapshot