Getting Started
Set up Kiri Chat — a RAG-powered chatbot that uses your documentation as a knowledge base, built with docfx, Qdrant vector database, and Ollama local LLM.
What is this project?
Kiri Chat is a RAG (Retrieval-Augmented Generation) system that transforms static markdown documentation into an interactive chat experience. Instead of traditional keyword search, users ask natural language questions and receive accurate, contextual answers sourced directly from the documentation.
This project uses docfx to generate the documentation site, but the core functionality is the chat system that lets users have conversations with an AI assistant that only answers based on your documentation content.
Features
- Static documentation site generated by docfx from markdown files
- Kiri Chat widget — local RAG chatbot that answers questions using your documentation as context
- Qdrant vector database running in Docker for fast semantic search
- Ollama integration for local embeddings (
nomic-embed-text) and chat (gemma:2b) - Header-based chunking preserves document structure when indexing
Architecture
flowchart LR
A[docfx site<br/>markdown] --> B[Kiri Chat<br/>FastAPI /chat]
B --> C[Qdrant<br/>Docker]
B <--> D[Ollama<br/>127.0.0.1:11434]
D -.->|embeddings| C
Prerequisites
Pull required Ollama models:
ollama pull nomic-embed-text
ollama pull gemma:2b
Scripts
| Script | Description |
|---|---|
pnpm run dev |
Start docfx site and FastAPI chat server |
pnpm run rag:qdrant:up |
Start Qdrant Docker container |
pnpm run rag:qdrant:down |
Stop Qdrant Docker container |
pnpm run rag:index |
Index all markdown files into Qdrant |
pnpm run rag:setup |
One-command setup: start Qdrant + index docs |
Quick Start
# Install dependencies
pnpm install
# Set up RAG (start Qdrant + index docs)
pnpm run rag:setup
# Start development server with chat API
pnpm run dev
The Kiri Chat docfx site will be available at http://localhost:8080 and the chat API at http://127.0.0.1:8000.
How It Works
1. Indexing (chat-api/index_docs.py)
- Recursively finds all
.mdfiles in the project (excludes_site/,qdrant_storage/) - Chunks documents by markdown headers (
#,##,###) - Generates embeddings using Ollama
nomic-embed-text(384 dimensions) - Stores vectors in Qdrant collection
docfx-docs
2. Kiri Chat RAG (chat-api/main.py)
- User sends message to
/chatendpoint via the chat widget - Query is embedded using
nomic-embed-text - Qdrant performs semantic + keyword search (top 5 results)
- Retrieved context is injected as system prompt
gemma:2bgenerates answer based on documentation context- Source links are returned for attribution
Configuration
Qdrant (docker-compose.yml)
- REST API:
http://localhost:6333 - gRPC:
http://localhost:6334 - Storage:
./qdrant_storage(persistent volume) - Telemetry: Disabled
Kiri Chat API (chat-api/main.py)
- Qdrant URL:
http://localhost:6333 - Ollama URL:
http://127.0.0.1:11434 - Embed Model:
nomic-embed-text - Chat Model:
gemma:2b - Collection:
docfx-docs
Chat Widget (chat-button.js)
- API Endpoint:
http://localhost:8000/chat - Auto-inject: Enabled (appends to
document.body) - Markdown rendering: Lazy-loaded
marked.jsfrom CDN
Project Structure
docfx-site/
├── chat-button.js # Kiri Chat web component (<chat-button>)
├── chat-api/
│ ├── main.py # Kiri Chat FastAPI RAG endpoint
│ ├── index_docs.py # Markdown indexing script
│ └── requirements.txt # Python dependencies
├── docs/ # Documentation markdown files
│ ├── introduction.md # Kiri Chat overview
│ ├── getting-started.md
│ └── chat-window.md # Chat widget docs
├── docker-compose.yml # Qdrant container definition
├── docfx.json # docfx configuration
├── package.json # pnpm scripts
└── index.md # Kiri Chat homepage
API Endpoints
POST /chat
Chat with your documentation using RAG.
Request:
{
"message": "How do I get started?"
}
Response:
{
"response": "Based on the documentation..."
}
GET /health
Health check endpoint.
Response:
{
"status": "healthy"
}