Notes & Decision Log
Format: YYYY-MM-DD — context — decision/finding.
Decisions
- 2026-05-22 — Motivating bug: asked Personal-RAG “current Haiku 4.5 input price?” → returned a Feb personal note (score 0.81) ahead of vendor pricing page (0.74). Root cause = same trust tier for notes and vendor docs.
- 2026-05-22 — First instinct: add
source_type=vendor_doctag + boost in re-rank. Discarded after prototype: fragile (classifier miss ~15%), boost weights tuning never converged, every new ingestion source needed the tag. - 2026-05-23 — Reframed as a workspace, not a tag. Routing decision moves before retrieval. Stronger guarantee, simpler code.
- 2026-05-23 — Three modes scoped: Mode A (audit skill), Mode B (continuous evidence gathering during conversation), Mode C (crawler daemon).
- 2026-05-23 — Mode B dropped per scope-cut. Reasoning: only 1 data point of demand (the original hallucination bug), and overlap with the existing weekly
kb-auditjob made the marginal value unclear. Re-evaluate after Mode A/C have 4 weeks of usage. - 2026-05-23 — Crawler is the only writer to
_canon. No manualkb_ingestpath, no filesystem mount-watcher path. The constraint is the product — every leak path opened would contaminate the workspace within weeks. - 2026-05-23 — Allowlist as the canon’s definition: docs.anthropic.com/, docs.x.ai/, huggingface.co/
/ (model cards only), arXiv PDFs cited in personal notes (one-shot). Explicitly excluded: blog posts, 3rd-party benchmarks, Twitter, personal notes. - 2026-05-23 — Daily cadence (not realtime). Vendor docs change weekly at most; daily delta crawl is ~6 min wall and stays well under rate limits.
- 2026-05-24 — Routing rule lives in the MCP orchestrator playbook, not in a prompt. Prompts drift, routing rules don’t. Server-side enforcement = consistent across all clients.
- 2026-05-24 — Shared bge-m3 embedder (same as Personal-RAG). Both workspaces must use the same query embedder so fallback score comparison is meaningful.
- 2026-05-24 — Tako server v0.6.0-s3 added
_canonworkspace, MCP toolkb_search_canon, classifier label, and orchestration playbook section. - 2026-05-24 —
fukuro-auditClaude Skill shipped with two branches: ideation (JTBD/prior-art/scope/ROI/deps/alts rubric) and production (6-category code scan + Grok 4.3 judge → P0/P1/P2 digest). - 2026-05-24 — Ship gate: hallucination rate on a 50-question vendor-fact set must drop below 5%. Measured <3% post-deploy.
- 2026-05-25 — Extensibility confirmed:
_canonpattern is reusable forpm_canon,design_canon,eng_canon. Same daemon shape, different allowlist + routing rule per workspace.
Gotchas
- 2026-05-23 — Tagging approach failed not because boosts didn’t work, but because every new source pipeline needed to remember to tag. Workspaces enforce the boundary at the schema level — no remembering required.
- 2026-05-23 — Initial allowlist included
anthropic.com/news/*(blog). Removed — vendor blogs are marketing, not authoritative spec. Same reasoning ruled out third-party benchmark sites. - 2026-05-24 — HuggingFace
huggingface.co/<org>/<model>/discussions/*was matched by an overly broad glob. Tightened pattern to model cards only (/{org}/{model}exact, no subpath). Otherwise community discussions would have leaked into canon. - 2026-05-24 — Routing classifier first version used regex on the query string. Missed “is Sonnet still the cheapest model?” (no obvious spec keyword). Added intent categories (model-comparison, deprecation, capability) — caught more queries without dropping precision.
- 2026-05-24 —
_sharedvs_canonconfusion in the orchestrator: AI clients defaulted to_sharedfor “what does Anthropic say about X?”. Fixed by explicit playbook rule:_shared= Marc’s interpretation (subjective),_canon= vendor truth (objective). - 2026-05-24 — httpx default User-Agent got 403 from one vendor’s CDN. Set custom
fukuro-crawler/0.1 (+marc personal)UA; problem resolved. - 2026-05-24 — Rate throttle initially per-crawl (not per-host); when hitting Anthropic + xAI + HF in parallel, one host got hammered. Refactored throttle to per-host (1 req/s/host) — same wall time, polite to vendors.
- 2026-05-25 — Documented
_sharedvs_canondistinction in tako’sserverInfo.instructionsso future clients connecting cold get the rule on handshake (no client-side config needed).
Scope-cut reasoning — Mode B
Worth documenting explicitly because the temptation to “just build all three modes” was strong:
- Mode B as proposed: during a Claude conversation, continuously gather evidence from
_canonin the background and surface relevant facts proactively (sidebar-style). - Demand evidence: 1 data point (the Haiku pricing hallucination). Mode A on its own would have caught this in a future audit; Mode C on its own would have made it findable on next query.
- Overlap risk: the existing weekly
kb-auditjob already does cross-source drift detection. Mode B would duplicate that job’s surface area without obvious differentiation. - Cost of building: would require streaming context injection, real-time relevance scoring, and a UI to surface findings without distracting from the conversation. Non-trivial.
- Decision: drop. Re-evaluate after 4 weeks of Mode A/C usage. If Marc reaches for Mode A inside a conversation more than once a week, that’s the signal to revisit Mode B.
Reference links
- Personal-RAG (parent project): personal-rag-kb
- bge-m3 embedder: https://huggingface.co/BAAI/bge-m3
- Anthropic docs (allowlist target): https://docs.anthropic.com
- xAI docs (allowlist target): https://docs.x.ai
- HuggingFace model card spec: https://huggingface.co/docs/hub/model-cards
- MCP spec (tool routing): https://spec.modelcontextprotocol.io
Working-session log
| Date | Hours | What | Outcome |
|---|---|---|---|
| 2026-05-22 evening | ~1 h | Hallucination bug reproduced, root-cause = trust tier mixing | Decision: workspace, not tag |
| 2026-05-23 morning | ~2 h | Mode A/B/C design on paper; allowlist drafted; integrity contract specified | Mode B dropped before any code |
| 2026-05-23 afternoon | ~2 h | tako server changes: _canon workspace, kb_search_canon MCP tool, classifier label | Server v0.6.0-s3 released |
| 2026-05-24 morning | ~2 h | Crawler daemon (Mode C): fetchers, guard, parser, launchd plist | First crawl ingested ~180 docs |
| 2026-05-24 afternoon | ~2 h | fukuro-audit Claude Skill (Mode A) — ideation + production prompts | Skill installed, tested both branches |
| 2026-05-24 evening | ~1 h | Routing playbook update in tako serverInfo.instructions; 50-question vendor-fact eval | Hallucination <3% verified |
| 2026-05-25 | ~30 min | _shared vs _canon rule documented; extensibility section added | Pattern locked for future canon workspaces |
| Total | ~10 h | Shipped | Sister tool to Personal-RAG live |