Getting Started

Set up Kiri Chat — a RAG-powered chatbot that uses your documentation as a knowledge base, built with docfx, Qdrant vector database, and Ollama local LLM.

What is this project?

Kiri Chat is a RAG (Retrieval-Augmented Generation) system that transforms static markdown documentation into an interactive chat experience. Instead of traditional keyword search, users ask natural language questions and receive accurate, contextual answers sourced directly from the documentation.

This project uses docfx to generate the documentation site, but the core functionality is the chat system that lets users have conversations with an AI assistant that only answers based on your documentation content.

Features

Static documentation site generated by docfx from markdown files
Kiri Chat widget — local RAG chatbot that answers questions using your documentation as context
Qdrant vector database running in Docker for fast semantic search
Ollama integration for local embeddings (nomic-embed-text) and chat (gemma:2b)
Header-based chunking preserves document structure when indexing

Architecture

flowchart LR
    A[docfx site<br/>markdown] --> B[Kiri Chat<br/>FastAPI /chat]
    B --> C[Qdrant<br/>Docker]
    B <--> D[Ollama<br/>127.0.0.1:11434]
    D -.->|embeddings| C

Prerequisites

.NET SDK (for docfx)
Ollama running on 127.0.0.1:11434
Docker (for Qdrant)
pnpm (package manager)

Pull required Ollama models:

ollama pull nomic-embed-text
ollama pull gemma:2b

Scripts

Script	Description
`pnpm run dev`	Start docfx site and FastAPI chat server
`pnpm run rag:qdrant:up`	Start Qdrant Docker container
`pnpm run rag:qdrant:down`	Stop Qdrant Docker container
`pnpm run rag:index`	Index all markdown files into Qdrant
`pnpm run rag:setup`	One-command setup: start Qdrant + index docs

Quick Start

# Install dependencies
pnpm install

# Set up RAG (start Qdrant + index docs)
pnpm run rag:setup

# Start development server with chat API
pnpm run dev

The Kiri Chat docfx site will be available at http://localhost:8080 and the chat API at http://127.0.0.1:8000.

How It Works

1. Indexing (`chat-api/index_docs.py`)

Recursively finds all .md files in the project (excludes _site/, qdrant_storage/)
Chunks documents by markdown headers (#, ##, ###)
Generates embeddings using Ollama nomic-embed-text (384 dimensions)
Stores vectors in Qdrant collection docfx-docs

2. Kiri Chat RAG (`chat-api/main.py`)

User sends message to /chat endpoint via the chat widget
Query is embedded using nomic-embed-text
Qdrant performs semantic + keyword search (top 5 results)
Retrieved context is injected as system prompt
gemma:2b generates answer based on documentation context
Source links are returned for attribution

Configuration

Qdrant (docker-compose.yml)

REST API: http://localhost:6333
gRPC: http://localhost:6334
Storage: ./qdrant_storage (persistent volume)
Telemetry: Disabled

Kiri Chat API (chat-api/main.py)

Qdrant URL: http://localhost:6333
Ollama URL: http://127.0.0.1:11434
Embed Model: nomic-embed-text
Chat Model: gemma:2b
Collection: docfx-docs

Chat Widget (`chat-button.js`)

API Endpoint: http://localhost:8000/chat
Auto-inject: Enabled (appends to document.body)
Markdown rendering: Lazy-loaded marked.js from CDN

Project Structure

docfx-site/
├── chat-button.js       # Kiri Chat web component (<chat-button>)
├── chat-api/
│   ├── main.py           # Kiri Chat FastAPI RAG endpoint
│   ├── index_docs.py     # Markdown indexing script
│   └── requirements.txt  # Python dependencies
├── docs/                 # Documentation markdown files
│   ├── introduction.md   # Kiri Chat overview
│   ├── getting-started.md
│   └── chat-window.md    # Chat widget docs
├── docker-compose.yml    # Qdrant container definition
├── docfx.json           # docfx configuration
├── package.json         # pnpm scripts
└── index.md             # Kiri Chat homepage

API Endpoints

POST `/chat`

Chat with your documentation using RAG.

Request:

{
  "message": "How do I get started?"
}

Response:

{
  "response": "Based on the documentation..."
}

GET `/health`

Health check endpoint.