← Back to project
● Shipped P1 Size S Foundation

AI-Canon-Crawler — Notes

Chronological decision log, scope-cut reasoning, and working-session hours.

Notes & Decision Log

Format: YYYY-MM-DD — context — decision/finding.

Decisions

  • 2026-05-22 — Motivating bug: asked Personal-RAG “current Haiku 4.5 input price?” → returned a Feb personal note (score 0.81) ahead of vendor pricing page (0.74). Root cause = same trust tier for notes and vendor docs.
  • 2026-05-22 — First instinct: add source_type=vendor_doc tag + boost in re-rank. Discarded after prototype: fragile (classifier miss ~15%), boost weights tuning never converged, every new ingestion source needed the tag.
  • 2026-05-23 — Reframed as a workspace, not a tag. Routing decision moves before retrieval. Stronger guarantee, simpler code.
  • 2026-05-23 — Three modes scoped: Mode A (audit skill), Mode B (continuous evidence gathering during conversation), Mode C (crawler daemon).
  • 2026-05-23Mode B dropped per scope-cut. Reasoning: only 1 data point of demand (the original hallucination bug), and overlap with the existing weekly kb-audit job made the marginal value unclear. Re-evaluate after Mode A/C have 4 weeks of usage.
  • 2026-05-23 — Crawler is the only writer to _canon. No manual kb_ingest path, no filesystem mount-watcher path. The constraint is the product — every leak path opened would contaminate the workspace within weeks.
  • 2026-05-23 — Allowlist as the canon’s definition: docs.anthropic.com/, docs.x.ai/, huggingface.co// (model cards only), arXiv PDFs cited in personal notes (one-shot). Explicitly excluded: blog posts, 3rd-party benchmarks, Twitter, personal notes.
  • 2026-05-23 — Daily cadence (not realtime). Vendor docs change weekly at most; daily delta crawl is ~6 min wall and stays well under rate limits.
  • 2026-05-24 — Routing rule lives in the MCP orchestrator playbook, not in a prompt. Prompts drift, routing rules don’t. Server-side enforcement = consistent across all clients.
  • 2026-05-24 — Shared bge-m3 embedder (same as Personal-RAG). Both workspaces must use the same query embedder so fallback score comparison is meaningful.
  • 2026-05-24 — Tako server v0.6.0-s3 added _canon workspace, MCP tool kb_search_canon, classifier label, and orchestration playbook section.
  • 2026-05-24fukuro-audit Claude Skill shipped with two branches: ideation (JTBD/prior-art/scope/ROI/deps/alts rubric) and production (6-category code scan + Grok 4.3 judge → P0/P1/P2 digest).
  • 2026-05-24 — Ship gate: hallucination rate on a 50-question vendor-fact set must drop below 5%. Measured <3% post-deploy.
  • 2026-05-25 — Extensibility confirmed: _canon pattern is reusable for pm_canon, design_canon, eng_canon. Same daemon shape, different allowlist + routing rule per workspace.

Gotchas

  • 2026-05-23 — Tagging approach failed not because boosts didn’t work, but because every new source pipeline needed to remember to tag. Workspaces enforce the boundary at the schema level — no remembering required.
  • 2026-05-23 — Initial allowlist included anthropic.com/news/* (blog). Removed — vendor blogs are marketing, not authoritative spec. Same reasoning ruled out third-party benchmark sites.
  • 2026-05-24 — HuggingFace huggingface.co/<org>/<model>/discussions/* was matched by an overly broad glob. Tightened pattern to model cards only (/{org}/{model} exact, no subpath). Otherwise community discussions would have leaked into canon.
  • 2026-05-24 — Routing classifier first version used regex on the query string. Missed “is Sonnet still the cheapest model?” (no obvious spec keyword). Added intent categories (model-comparison, deprecation, capability) — caught more queries without dropping precision.
  • 2026-05-24_shared vs _canon confusion in the orchestrator: AI clients defaulted to _shared for “what does Anthropic say about X?”. Fixed by explicit playbook rule: _shared = Marc’s interpretation (subjective), _canon = vendor truth (objective).
  • 2026-05-24 — httpx default User-Agent got 403 from one vendor’s CDN. Set custom fukuro-crawler/0.1 (+marc personal) UA; problem resolved.
  • 2026-05-24 — Rate throttle initially per-crawl (not per-host); when hitting Anthropic + xAI + HF in parallel, one host got hammered. Refactored throttle to per-host (1 req/s/host) — same wall time, polite to vendors.
  • 2026-05-25 — Documented _shared vs _canon distinction in tako’s serverInfo.instructions so future clients connecting cold get the rule on handshake (no client-side config needed).

Scope-cut reasoning — Mode B

Worth documenting explicitly because the temptation to “just build all three modes” was strong:

  • Mode B as proposed: during a Claude conversation, continuously gather evidence from _canon in the background and surface relevant facts proactively (sidebar-style).
  • Demand evidence: 1 data point (the Haiku pricing hallucination). Mode A on its own would have caught this in a future audit; Mode C on its own would have made it findable on next query.
  • Overlap risk: the existing weekly kb-audit job already does cross-source drift detection. Mode B would duplicate that job’s surface area without obvious differentiation.
  • Cost of building: would require streaming context injection, real-time relevance scoring, and a UI to surface findings without distracting from the conversation. Non-trivial.
  • Decision: drop. Re-evaluate after 4 weeks of Mode A/C usage. If Marc reaches for Mode A inside a conversation more than once a week, that’s the signal to revisit Mode B.

Working-session log

DateHoursWhatOutcome
2026-05-22 evening~1 hHallucination bug reproduced, root-cause = trust tier mixingDecision: workspace, not tag
2026-05-23 morning~2 hMode A/B/C design on paper; allowlist drafted; integrity contract specifiedMode B dropped before any code
2026-05-23 afternoon~2 htako server changes: _canon workspace, kb_search_canon MCP tool, classifier labelServer v0.6.0-s3 released
2026-05-24 morning~2 hCrawler daemon (Mode C): fetchers, guard, parser, launchd plistFirst crawl ingested ~180 docs
2026-05-24 afternoon~2 hfukuro-audit Claude Skill (Mode A) — ideation + production promptsSkill installed, tested both branches
2026-05-24 evening~1 hRouting playbook update in tako serverInfo.instructions; 50-question vendor-fact evalHallucination <3% verified
2026-05-25~30 min_shared vs _canon rule documented; extensibility section addedPattern locked for future canon workspaces
Total~10 hShippedSister tool to Personal-RAG live