← Back to home

RAG: indexing, retrieval, and evaluation

An end-to-end Retrieval Augmented Generation reference built on LangChain, OpenAI, and Chroma. Three pipelines share configuration through a shared module, and the evaluation pipeline grades answers three complementary ways. Source on GitHub.


Indexing

The indexing pipeline loads a web document, splits it into chunks with overlap, embeds each chunk with OpenAI's text-embedding-3-small, and persists the result to a local Chroma vectorstore.

Indexing pipeline: load, split, embed, store

Retrieval and generation

The RAG pipeline takes a user question, retrieves the top similar chunks from the vectorstore, and feeds them into a chat model along with a constrained prompt. The chain composes as retriever → prompt → model → string output, idiomatic LangChain.

RAG pipeline: retrieve, prompt, generate

Evaluation: three lenses

A separate evaluation pipeline grades each answer three complementary ways. The lenses disagree often, which is exactly when evaluation pays off.

  • Code-based. Deterministic, free, instant. Length bounds, keyword presence, structural rules. Catches format regressions; blind to meaning.
  • Model-as-judge. A separate LLM scores relevance and faithfulness against the retrieved context. Catches semantic problems the code checks cannot, at the cost of an extra API call.
  • Human-grade. The ground truth the other two only approximate. Auto-skips when run non-interactively.

What I would change for production

The repo README spells out the migration path candidly. Each topic carries an honest tradeoff.

  • Observability. LangSmith tracing so each retrieval and generation becomes an inspectable span.
  • Cost guards. Rate limiting, token budget caps, and embedding-call dedupe for unchanged source documents.
  • Async indexing. A background worker for large corpora so the request path stays snappy.
  • Citations. Returning which retrieved chunks contributed to each answer for trust and debugging.
  • Tests. Unit tests for chain composition; a smoke test against a tiny fixed corpus.
  • Schema versioning. Versioning the collection by embedding model and chunk size as the vectorstore evolves.

Run it yourself

Clone the repo, set OPENAI_API_KEY, install the requirements, and run the three pipelines in sequence. Full instructions live in the README on GitHub.