Build a Production-Style RAG System in Go — From Zero to Streaming Chat
Learn Retrieval-Augmented Generation by building one yourself, in plain Go, against any OpenAI-compatible model — local or hosted.
Stop reading about RAG and start shipping it. In this hands-on course you will build a complete, end-to-end Retrieval-Augmented Generation system from the ground up using the Go programming language. No Python. No LangChain. No magical abstractions. Just clear, idiomatic Go code that you can read, modify, and own.
By the end of the course, you will have a working application featuring a streaming terminal chat REPL, a browser-based chat UI with token-by-token Server-Sent Events, file and image uploads, a background filesystem watcher that ingests documents automatically, an evaluation harness that scores retrieval quality, and a Postgres + pgvector backend running in Docker.
Why this course?
Most RAG tutorials hide the interesting parts behind a framework. You wire three lines of someone else's library together, it works, and you have no idea what just happened. When something breaks in production — and it will — you are stuck.
This course takes the opposite approach. Every component is built explicitly, with clean seams between concepts so you can see exactly where the LLM client ends and the vector store begins. The package layout maps directly to lecture chapters. The interfaces between the LLM, the embedder, the vector store, the retriever, the chat loop, and the web server are deliberately exposed so you can swap pieces in and out as exercises.
This is the course I wish existed when I was learning RAG.
What you will build
A small but real RAG application with all the moving parts of a production system:
- A streaming chat REPL (Read-Eval-Print Loop) in the terminal with a "thinking" spinner and proper history management
- A web chat UI built with chi, Go templates, and Tailwind, streaming tokens to the browser over SSE with in-browser markdown rendering
- A background filesystem watcher that detects new documents, chunks them, embeds them, and upserts them into pgvector — then moves the originals out of the way
- A synchronous file upload path on the web UI for drag-and-drop ingest with chunk-count feedback
- An image upload pipeline with optional auto-captioning by a vision-capable model, served back to the browser and rendered inline in chat
- A paragraph-aware chunker with configurable size and overlap
- A query rewriter that turns multi-turn conversation into a standalone search query before retrieval
- A retriever with cosine-similarity filtering, top-K hit selection, and pluggable backends
- A pgvector + Postgres 18 vector store with idempotent migrations, HNSW indexing, and a delete-by-source path that keeps re-ingest clean
What you will learn
- How a RAG pipeline actually works end-to-end: chunking, embedding, vector search, query rewriting, context injection, and streaming generation
- How to design Go interfaces so the LLM, the embedder, and the vector store are swappable without touching the rest of the codebase
- How to stream LLM tokens to a terminal AND to a browser with Server-Sent Events
- How to run everything against **OpenAI, Ollama, LM Studio, or Groq** — and how to mix and match (e.g. hosted chat with local embeddings)
- How to use Postgres + pgvector for production-grade vector search, including HNSW indexes and embedding-dimension migrations
- How to ingest documents reactively with `fsnotify`, debouncing half-written files, and idempotent re-ingest
- How to handle multimodal content: image upload, vision-model captioning, and image rendering in chat
- How to debug "why didn't the model use my docs?"
What makes this course different
- Real code, not pseudo-code. Every example in the course is from a working, runnable project.
- Local-first. You can complete the entire course with Ollama on your laptop. No API bills required.
- Honest about tradeoffs. The course covers known limitations (chunker is token-blind, delete-then-upsert is not transactional, image retrieval is description-based) so you understand the design space, not just one fixed answer.
Tech stack you will use
Go, Postgres 18, pgvector, Docker Compose, chi router, Go templates, Tailwind, Server-Sent Events, fsnotify, OpenAI-compatible APIs (works with OpenAI, Ollama, LM Studio, Groq, and others).
Course outcome
When you finish this course, you will have a portfolio-quality RAG application running on your machine, a deep understanding of how every layer works, and the confidence to drop the same architecture into a real product at work. You will know what to measure, what to swap, and what to leave alone.
Enrol now and start building your own RAG system today.