Feature Knowledge Bases & Chat Attachments

Your docs, your policies,
answerable in plain English.

Curate org-wide Knowledge Bases. Drop a PDF into chat for a one-session answer. Either way, PII is masked before retrieval ever reaches the LLM — and every document obeys your role-based read / write rules.

256
Token chunks with 10% overlap, embedded by all-MiniLM-L6-v2
0
Re-encodes when promoting a session upload to a permanent KB
Per-session
Strictly isolated ephemeral KB per (org, user, session)
RBAC
read_roles / write_roles enforced on every ingest and query

Curate the answers your team always asks

One config block defines a KB. The CLI ingests PDFs, URLs, or whole directories. A SQLite sidecar dedups by content hash and locks the embed model.

📚
Many KBs per org
Define multiple KBs in config/config.yaml with their own read / write roles, descriptions, and retention.
🧬
Content-hash dedup
Re-ingest the same PDF — Zenzo skips the work. A UNIQUE constraint in the sidecar is the source of truth.
🔒
Embed-model lock-in
Swapping embed model without rebuild_index() is rejected — your vectors stay coherent.
🧹
Drag-and-drop UI
Bulk upload, chunk preview, semantic search, and per-doc delete confirmation.
📡
Intent-routed retrieval
When the top KB chunk scores above 0.65 and the question has no SQL aggregation tokens, Zenzo answers from docs — not from your warehouse.
🛡️
PII-masked chunks
Email, phone, SSN, card and IPv4 patterns are masked by the Privacy Firewall before retrieved text reaches the LLM.

Drop a file. Ask. Move on.

Every chat session gets its own ephemeral KB collection, scoped strictly to (org, user, session). Uploads are routed by extension + MIME + magic-byte sniff so spreadsheets land in SQL and policies land in retrieval — automatically.

📂
Smart routing
Wide CSV / Excel / array-of-objects → SQL pipeline. PDF / HTML / Markdown / text → ephemeral KB. Ambiguous? Zenzo asks.
🚪
Strict isolation
Collection name = _session_{org}_{user}_{sha256(session)[:16]}. A neighbour's session cannot be queried.
♻️
Zero-recompute promotion
Promote a session doc to a permanent KB — Zenzo re-uses the existing embeddings when the embed models match.
🧯
Janitor reaping
Background janitor honours session_ttl_hours; CLI chat_attach reap on demand.
📏
Per-user / per-day caps
Per-doc MB, per-KB doc count, per-user MB/day, per-user uploads/hour — all configurable.
🏷️
Enumerated outcomes
Embed-model mismatch is first-class: status='failed', reason='embed_model_mismatch' — never a silent corruption.

Built for admins, not just analysts

Ingest from anywhere
kb_admin ingest --kb company_docs --source ./docs/policy.pdf
kb_admin ingest --kb company_docs --url https://...
kb_admin ingest --kb company_docs --bulk ./docs
Search & manage
kb_admin search --kb company_docs --query "data retention" --top-k 5
kb_admin list --kb company_docs
kb_admin rebuild --kb company_docs
Chat-attach ops
chat_attach list-sessions
chat_attach reap
chat_attach promote --upload <doc_id> --kb company_docs
Smoke test
python _kb_smoke.py — builds a small PDF + markdown, ingests, searches, dedup-checks.
"Our analysts ask 'what does the renewal policy say' in the same chat where they ask 'what's our renewal pipeline'. Same copilot. Both audited."
— Head of Data, EU FinTech

Turn your docs into an answerable corpus

One YAML block, one ingest command, one secure UI. Your policies become first-class data.

Read the security model