Building a RAG Application in Python

Learn Retrieval-Augmented Generation by building one in Python, against any OpenAI-compatible model — local or hosted

Course Summary

Build a working Retrieval-Augmented Generation (RAG) application in Python — from an empty directory to a streaming web chat with multi-turn memory, hybrid retrieval, image ingestion, and two interchangeable vector-store backends. No LangChain, no LlamaIndex, no magic. You write every line yourself, and by the end you understand exactly what each one does.  

Most RAG tutorials wrap everything in a single high-level library and stop at "it works." This course goes the other way. You'll build the pipeline from scratch — chunking, embeddings, idempotent ingestion, hybrid semantic-plus-lexical retrieval with Reciprocal Rank Fusion, a query rewriter for follow-up questions, server-sent token streaming, a vision-model branch for images — on top of plain Postgres (with pgvector) and a local Ollama server. No API bills while you learn. No black boxes. When you later reach for a framework like LangChain, you'll actually understand what it's doing under the hood.

What you'll build, in one project:

  • Runs entirely locally against Ollama, or transparently against the OpenAI API by changing one environment variable 
  • Stores embeddings in Postgres + pgvector with HNSW indexing, or in Weaviate — backends swappable via a single config setting 
  • Hybrid retrieval: dense vector search and Postgres full-text BM25, fused with Reciprocal Rank Fusion — fixing the cases where pure semantic search silently fails on rare terms, names, and identifiers 
  • A directory watcher that ingests new files automatically, with editor-save debouncing so it never reads a half-written file 
  • A streaming web chat UI built on FastAPI + Server-Sent Events + vanilla JavaScript — no React, no build step — with multi-turn memory, query rewriting for follow-ups, source citations, and inline 
  • Image rendering Image ingestion through a vision model with a "describe-then-embed" pipeline — multimodal in the same chunks table, no schema change required

Along the way you'll work through real software-design patterns in real code: Dependency Injection, Strategy/Adapter, Factory, lifespans, context managers, thread-safety boundaries, atomic transactions, defensive coding against external services that quietly don't work the way their docs claim. The course's recurring theme is the payoff of good abstractions: the vector-store interface designed early lets you bolt on a second backend in one file; the same retrieval pipeline serves both the CLI and the web app; the chunk-metadata field that seemed academic early in the course is what makes image support a simple change later on.

You'll finish with a codebase you can extend — add a reranker, try a different embedder, swap the chat model, point it at a corpus of your own docs — and the engineering vocabulary to talk about RAG as production software, not a notebook demo.

Course Curriculum

Trevor Sawler

Trevor has more than twenty years of experience in professional software development, and over 30 years of experience as a university professor. As an entrepreneur, he has worked with a broad range of clients, including Thomson Nelson, Hewlett Packard, the Royal Bank of Canada, Keybank, Sprint, and many, many others. He also has extensive management and project management experience. He has led teams of fifty developers and artists on multi-million dollar projects, and much smaller teams on much smaller projects. Trevor continues to work projects for a variety of clients every day. As a professor, he has taught in a wide variety of course areas, including Computer Science, English, Irish, and American literature, and a number of "crossover" courses that bridge the liberal arts and technological fields. He has won regional, national, and international awards for his work in the IT field, and has also won awards for his teaching and research as a University professor.

Course Pricing

One payment

$24.99 CAD

Buy Now