Qdrant RAG Vector Index Workflow

This page documents a reusable Qdrant RAG workflow using committed frontend manifests, backend-generated database manifests, local embeddings, Qdrant vector storage, and an answer endpoint. The commands and paths are written as a directly usable reference for this codebase while keeping the workflow generic enough for similar Docker-based RAG systems.

Overview

Use Qdrant as the vector store for normalized RAG chunks.
Keep source extraction separate from vector indexing.
Export frontend-owned content into committed JSON manifests.
Export backend database content into generated JSON manifests.
Embed chunk text with the same embedding model used for query-time retrieval.
Store vectors and source payloads in Qdrant.
Generate answers only from selected retrieved evidence.
Save query events and selected source metadata in MySQL if analytics or debugging is needed.
Run reindex only after verifying all expected manifests are available.

System flow

The RAG system has two main phases: indexing time and query time. Indexing time builds the Qdrant collection. Query time embeds the user query, retrieves matching chunks, selects evidence, and generates an answer.

Indexing time:
source content
  -> frontend manifest export
  -> backend database source export
  -> manifest validation
  -> chunk embedding
  -> Qdrant collection recreate
  -> vector upsert with payload metadata

Query time:
user question
  -> query embedding
  -> Qdrant vector search
  -> lexical retrieval merge
  -> evidence selection
  -> answer generation
  -> query event logging

Indexed source categories

The same indexing flow can support frontend-owned content and backend-owned content. Frontend content is usually exported from static files, registries, rendered pages, or metadata files. Backend content is usually exported from database tables or internal services.

Source category	Storage owner	Content mode	Manifest type
Content 1	Frontend source files	full_text	Frontend source manifest
Content 2	Frontend rendered files	full_text_or_rendered_output	Frontend source manifest
Content 3	Frontend metadata files	metadata_only	Frontend source manifest
Content 4	Backend database	metadata_only	Backend-generated manifest
Content 5	Backend service output	metadata_plus_internal_brief	Backend-generated manifest

Frontend manifests are generated from files that already exist in the frontend application or static assets.
Backend manifests are generated from database-backed or service-backed content.
Full-text sources can support direct technical answers.
Rendered-output sources can support answers based on visible rendered content, tables, and output text.
Metadata-only sources should be used for discovery and navigation.
Backend database sources must be exported from the active database environment before reindexing.

Chunk format

Every source becomes one or more normalized chunks. The indexer does not need to know the original source implementation once the chunk format is produced.

type LabChunk = {
  sourceType: string
  sourceId: string
  chunkId: string
  parentId?: string
  title: string
  description?: string
  category?: string
  sectionTitle?: string
  url: string
  externalUrl?: string
  contentMode: "full_text" | "full_text_or_rendered_output" | "metadata_only" | "metadata_plus_internal_brief"
  text: string
  metadata: Record<string, string | number | boolean | string[]>
}

chunkId must be unique across all manifests.
url should point to the internal page used for citations.
text is the field embedded into the vector store.
metadata should stay JSON-safe.
contentMode controls answer grounding behavior.

Local Docker services

Run Qdrant as an internal Docker Compose service. The backend talks to Qdrant through the Compose service name.

qdrant:
  image: qdrant/qdrant
  container_name: ubuntu-web-qdrant
  restart: unless-stopped
  ports:
    - "127.0.0.1:6333:6333"
  volumes:
    - qdrant_data:/qdrant/storage

volumes:
  qdrant_data:

Bind Qdrant to 127.0.0.1 for local inspection.
Use the Docker service name inside backend containers.
Persist Qdrant data with qdrant_data.
Do not expose Qdrant publicly in production unless a separate security layer is added.

Backend mounts and environment

The backend container needs read access to committed frontend manifests and write access to generated backend manifests.

backend:
  volumes:
    - ./backend:/workspace
    - ./frontend/public/lab-index:/frontend-lab-index:ro

  environment:
    - QDRANT_URL=http://qdrant:6333
    - QDRANT_COLLECTION=mlnotebooks_lab_chunks
    - LAB_FRONTEND_INDEX_DIR=/frontend-lab-index
    - LAB_BACKEND_INDEX_DIR=/workspace/generated/lab-index
    - LAB_TOP_K=10

LAB_FRONTEND_INDEX_DIR points to committed frontend manifest files.
LAB_BACKEND_INDEX_DIR points to backend-generated manifest files.
QDRANT_URL should use http://qdrant:6333 inside Docker.
If reindex runs from a worker container, mount the same paths there as well.

Embedding and answer models

Qdrant stores vectors but does not create embeddings in this setup. The backend creates embeddings and writes the vectors to Qdrant.

LAB_EMBEDDING_MODEL = "BAAI/bge-small-en-v1.5"
LAB_VECTOR_SIZE = 384
LAB_ANSWER_MODEL = "gpt-5.4-nano"

Use the same embedding model during indexing and query-time retrieval.
The Qdrant vector size must match the embedding model output size.
Changing the embedding model requires a full reindex.
Changing only the answer model does not require reindexing.

Frontend manifest export

Run the frontend export when frontend-owned sources change. The generated manifest files are committed for a simple deployment flow.

docker compose exec frontend npm run lab:export

The export writes normalized JSON manifests into the frontend Lab index directory.

frontend/public/lab-index/

Commit the generated manifests with the content change.

git add frontend/public/lab-index/*.json
git commit -m "Update lab index manifests"

Backend database source export

Backend database sources are exported from the active database environment. This avoids committing environment-specific database content.

docker compose exec backend python scripts/lab_export_courses.py

The generated backend manifest is written into the backend Lab index directory.

backend/generated/lab-index/

Run this locally for local database content.
Run this in production for production database content.
Run it before reindexing when backend database sources change.

Manifest validation

Always validate manifests before running reindex. The reindex script recreates the Qdrant collection, so a missing manifest can create a partial index.

docker compose exec backend python scripts/lab_check_manifests.py

Expected output should show all expected source groups with non-zero counts.

frontend source manifests: > 0
backend database manifests: > 0
total: expected total chunk count

If frontend manifests are zero, check /frontend-lab-index inside the backend container.
If backend database manifests are zero, run the backend export script.
Do not run reindex until expected manifest counts are correct.

Reindex execution

The reindex script loads all manifests, embeds chunk text, recreates the Qdrant collection, and upserts vector points with payload metadata.

docker compose exec backend python scripts/lab_reindex.py

Expected final output should show the total chunk count and the Qdrant point count.

"total_chunks": 1274
"points_count": 1274

Reindex is required after source manifests change.
Reindex is required after backend database source export changes.
Reindex is required after changing embedding model or vector size.
Reindex is not required for UI-only changes.
Reindex is not required for answer prompt-only changes.

Qdrant checks

Use the Qdrant check script to confirm backend connectivity and the collection list.

docker compose exec backend python scripts/lab_check_qdrant.py

Use the collection count helper to confirm the indexed point count.

docker compose exec backend python - <<'PY'
from services.lab.qdrant_store import get_lab_collection_count
print(get_lab_collection_count())
PY

Qdrant point payload

Each Qdrant point stores the embedding vector and the payload needed for retrieval, evidence selection, answer generation, and citation rendering.

payload = {
    "sourceType": chunk["sourceType"],
    "sourceId": chunk["sourceId"],
    "chunkId": chunk["chunkId"],
    "parentId": chunk.get("parentId", ""),
    "title": chunk["title"],
    "description": chunk.get("description", ""),
    "category": chunk.get("category", ""),
    "sectionTitle": chunk.get("sectionTitle", ""),
    "url": chunk["url"],
    "externalUrl": chunk.get("externalUrl", ""),
    "contentMode": chunk["contentMode"],
    "text": chunk["text"],
    "metadata": chunk.get("metadata", {}),
}

Keeping text in the payload lets the answer endpoint use selected evidence without reading the original source files at query time.

Retrieval and evidence selection

Retrieval should combine vector search with a generic lexical layer. This improves exact technical term matching without hardcoding case-specific query expansions.

query
  -> FastEmbed vector
  -> Qdrant vector candidates
  -> BM25-style lexical candidates
  -> score merge
  -> page diversification
  -> evidence selection
  -> answer prompt

Vector search handles semantic similarity.
Lexical search handles commands, package names, dataset names, and exact terms.
Page diversification prevents one source from dominating all candidates.
Evidence selection sends only the strongest chunks to the answer model.

Answer endpoint

The answer endpoint receives the user question, performs retrieval, selects evidence, calls the answer model, returns answer and citations, and optionally logs the event.

POST /api/lab/answer

{
  "query": "How do I convert notebooks to HTML?",
  "limit": 10
}

The response should include answer, citations, retrieved count, and insufficient-context flag.
Inline citations should point to internal source URLs.
Metadata-only sources should be used for discovery and navigation, not full-detail claims.

Query event logging

Query event logging stores each answer request in MySQL. This is analytics and debugging history, not a persistent chat session.

Field group	Stored data
User fields	user_id, user_email, user_tier
Anonymous fields	anonymous_id
Query and answer	query_text, answer_text
Source metadata	citations_json, selected_sources_json
Quality metadata	retrieved_count, insufficient_context
Model metadata	answer_model, embedding_model
Request metadata	route, client_ip, client_user_agent

Logged-in requests should store user id and email.
Logged-out requests should store a browser-level anonymous id.
Selected sources are saved for later inspection of answer grounding.

Local update sequence

Use this sequence when locally rebuilding the full RAG index after source content changes.

docker compose exec frontend npm run lab:export

docker compose exec backend python scripts/lab_export_courses.py

docker compose exec backend python scripts/lab_check_manifests.py

docker compose exec backend python scripts/lab_reindex.py

Production Docker requirements

Production needs Qdrant, a persistent Qdrant volume, backend access to committed manifests, backend access to generated manifests, and a cache volume for the local embedding model.

qdrant:
  image: qdrant/qdrant
  container_name: mlnotebooks-qdrant
  restart: unless-stopped
  volumes:
    - qdrant_data:/qdrant/storage

backend:
  volumes:
    - ./frontend/public/lab-index:/frontend-lab-index:ro
    - ./backend/generated/lab-index:/workspace/generated/lab-index
    - hf_cache:/root/.cache/huggingface

volumes:
  qdrant_data:
  hf_cache:

qdrant_data persists the vector collection.
hf_cache avoids redownloading the embedding model after every rebuild.
No public Qdrant port is required for the backend to use Qdrant.

Production update sequence

Use this sequence when committed manifests or production database sources changed.

git pull

docker compose --env-file .env.prod -f compose.prod.yaml build
docker compose --env-file .env.prod -f compose.prod.yaml up -d

docker compose --env-file .env.prod -f compose.prod.yaml exec backend python scripts/lab_export_courses.py

docker compose --env-file .env.prod -f compose.prod.yaml exec backend python scripts/lab_check_manifests.py

docker compose --env-file .env.prod -f compose.prod.yaml exec backend python scripts/lab_reindex.py

If only UI or prompt logic changed, skip the export and reindex steps.

When to run each command

Change	Local action	Production action
Frontend-owned source content changed	Run lab:export and commit JSON	Pull and run lab_reindex.py
Backend database sources changed	Run lab_export_courses.py and lab_reindex.py	Run lab_export_courses.py and lab_reindex.py
Only UI changed	No reindex	No reindex
Only prompt changed	No reindex	No reindex
Embedding model changed	Full lab_reindex.py	Full lab_reindex.py
Qdrant volume recreated	Full lab_reindex.py	Full lab_reindex.py
Query logging migration added	Run migration once	Run migration once

Safe reindex checklist

Confirm Qdrant is running.
Confirm frontend manifests exist in /frontend-lab-index.
Export backend database sources from the active environment.
Run lab_check_manifests.py.
Verify all expected source counts are non-zero.
Run lab_reindex.py.
Confirm Qdrant points_count equals the total manifest count.
Ask one question for each major source category.

Common failure: partial index

A partial index happens when reindex runs while one or more manifests are missing. Since reindex recreates the collection, the previous complete index is replaced by whatever sources were visible at that moment.

docker compose exec backend bash -lc "ls -lh /frontend-lab-index"

docker compose exec backend bash -lc "ls -lh /workspace/generated/lab-index"

docker compose exec backend python scripts/lab_check_manifests.py

If required manifests are missing or zero, fix the mount or export step before reindexing.

Common failure: Qdrant name resolution

If reindex fails with Temporary failure in name resolution, the backend cannot resolve the Qdrant service name.

docker compose ps

docker compose exec backend bash -lc 'echo $QDRANT_URL'

docker compose exec backend bash -lc 'python - <<PY
import socket
print(socket.gethostbyname("qdrant"))
PY'

The Qdrant service name must match QDRANT_URL.
Use http://qdrant:6333 from inside Docker Compose.
Do not use localhost from inside the backend container unless Qdrant runs in the same container.

Common failure: wrong vector size

A vector size mismatch means the collection was created with a different vector dimension than the embedding model produces.

Check the embedding model output dimension.
Check the Qdrant collection vector size.
Recreate the collection with a full reindex after changing embedding model.
Do not mix vectors from different embedding models in the same collection.