Notes & Decision Log

Format: YYYY-MM-DD — context — decision/finding.

Decisions

2026-05-22 — Motivating bug: asked Personal-RAG “current Haiku 4.5 input price?” → returned a Feb personal note (score 0.81) ahead of vendor pricing page (0.74). Root cause = same trust tier for notes and vendor docs.
2026-05-22 — First instinct: add source_type=vendor_doc tag + boost in re-rank. Discarded after prototype: fragile (classifier miss ~15%), boost weights tuning never converged, every new ingestion source needed the tag.
2026-05-23 — Reframed as a workspace, not a tag. Routing decision moves before retrieval. Stronger guarantee, simpler code.
2026-05-23 — Three modes scoped: Mode A (audit skill), Mode B (continuous evidence gathering during conversation), Mode C (crawler daemon).
2026-05-23 — Mode B dropped per scope-cut. Reasoning: only 1 data point of demand (the original hallucination bug), and overlap with the existing weekly kb-audit job made the marginal value unclear. Re-evaluate after Mode A/C have 4 weeks of usage.
2026-05-23 — Crawler is the only writer to _canon. No manual kb_ingest path, no filesystem mount-watcher path. The constraint is the product — every leak path opened would contaminate the workspace within weeks.
2026-05-23 — Allowlist as the canon’s definition: docs.anthropic.com/, docs.x.ai/, huggingface.co// (model cards only), arXiv PDFs cited in personal notes (one-shot). Explicitly excluded: blog posts, 3rd-party benchmarks, Twitter, personal notes.
2026-05-23 — Daily cadence (not realtime). Vendor docs change weekly at most; daily delta crawl is ~6 min wall and stays well under rate limits.
2026-05-24 — Routing rule lives in the MCP orchestrator playbook, not in a prompt. Prompts drift, routing rules don’t. Server-side enforcement = consistent across all clients.
2026-05-24 — Shared bge-m3 embedder (same as Personal-RAG). Both workspaces must use the same query embedder so fallback score comparison is meaningful.
2026-05-24 — Tako server v0.6.0-s3 added _canon workspace, MCP tool kb_search_canon, classifier label, and orchestration playbook section.
2026-05-24 — fukuro-audit Claude Skill shipped with two branches: ideation (JTBD/prior-art/scope/ROI/deps/alts rubric) and production (6-category code scan + Grok 4.3 judge → P0/P1/P2 digest).
2026-05-24 — Ship gate: hallucination rate on a 50-question vendor-fact set must drop below 5%. Measured <3% post-deploy.
2026-05-25 — Extensibility confirmed: _canon pattern is reusable for pm_canon, design_canon, eng_canon. Same daemon shape, different allowlist + routing rule per workspace.

Gotchas

2026-05-23 — Tagging approach failed not because boosts didn’t work, but because every new source pipeline needed to remember to tag. Workspaces enforce the boundary at the schema level — no remembering required.
2026-05-23 — Initial allowlist included anthropic.com/news/* (blog). Removed — vendor blogs are marketing, not authoritative spec. Same reasoning ruled out third-party benchmark sites.
2026-05-24 — HuggingFace huggingface.co/<org>/<model>/discussions/* was matched by an overly broad glob. Tightened pattern to model cards only (/{org}/{model} exact, no subpath). Otherwise community discussions would have leaked into canon.
2026-05-24 — Routing classifier first version used regex on the query string. Missed “is Sonnet still the cheapest model?” (no obvious spec keyword). Added intent categories (model-comparison, deprecation, capability) — caught more queries without dropping precision.
2026-05-24 — _shared vs _canon confusion in the orchestrator: AI clients defaulted to _shared for “what does Anthropic say about X?”. Fixed by explicit playbook rule: _shared = Marc’s interpretation (subjective), _canon = vendor truth (objective).
2026-05-24 — httpx default User-Agent got 403 from one vendor’s CDN. Set custom fukuro-crawler/0.1 (+marc personal) UA; problem resolved.
2026-05-24 — Rate throttle initially per-crawl (not per-host); when hitting Anthropic + xAI + HF in parallel, one host got hammered. Refactored throttle to per-host (1 req/s/host) — same wall time, polite to vendors.
2026-05-25 — Documented _shared vs _canon distinction in tako’s serverInfo.instructions so future clients connecting cold get the rule on handshake (no client-side config needed).

Scope-cut reasoning — Mode B

Worth documenting explicitly because the temptation to “just build all three modes” was strong:

Mode B as proposed: during a Claude conversation, continuously gather evidence from _canon in the background and surface relevant facts proactively (sidebar-style).
Demand evidence: 1 data point (the Haiku pricing hallucination). Mode A on its own would have caught this in a future audit; Mode C on its own would have made it findable on next query.
Overlap risk: the existing weekly kb-audit job already does cross-source drift detection. Mode B would duplicate that job’s surface area without obvious differentiation.
Cost of building: would require streaming context injection, real-time relevance scoring, and a UI to surface findings without distracting from the conversation. Non-trivial.
Decision: drop. Re-evaluate after 4 weeks of Mode A/C usage. If Marc reaches for Mode A inside a conversation more than once a week, that’s the signal to revisit Mode B.

Reference links

Personal-RAG (parent project): personal-rag-kb
bge-m3 embedder: https://huggingface.co/BAAI/bge-m3
Anthropic docs (allowlist target): https://docs.anthropic.com
xAI docs (allowlist target): https://docs.x.ai
HuggingFace model card spec: https://huggingface.co/docs/hub/model-cards
MCP spec (tool routing): https://spec.modelcontextprotocol.io

Working-session log

Date	Hours	What	Outcome
2026-05-22 evening	~1 h	Hallucination bug reproduced, root-cause = trust tier mixing	Decision: workspace, not tag
2026-05-23 morning	~2 h	Mode A/B/C design on paper; allowlist drafted; integrity contract specified	Mode B dropped before any code
2026-05-23 afternoon	~2 h	tako server changes: `_canon` workspace, `kb_search_canon` MCP tool, classifier label	Server v0.6.0-s3 released
2026-05-24 morning	~2 h	Crawler daemon (Mode C): fetchers, guard, parser, launchd plist	First crawl ingested ~180 docs
2026-05-24 afternoon	~2 h	`fukuro-audit` Claude Skill (Mode A) — ideation + production prompts	Skill installed, tested both branches
2026-05-24 evening	~1 h	Routing playbook update in tako `serverInfo.instructions`; 50-question vendor-fact eval	Hallucination <3% verified
2026-05-25	~30 min	`_shared` vs `_canon` rule documented; extensibility section added	Pattern locked for future canon workspaces
Total	~10 h	Shipped	Sister tool to Personal-RAG live

AI-Canon-Crawler — Notes

Notes & Decision Log

Decisions

Gotchas

Scope-cut reasoning — Mode B

Reference links

Working-session log