

Build Your First RAG Application with LLMs
This is the 1st workshop in our series to update the LLM Zoomcamp content.
This workshop updates Module 1: Introduction to LLMs and RAG.
In this hands-on session, Alexey Grigorev will show how to build a basic Retrieval-Augmented Generation pipeline for answering questions about course FAQ documents.
You’ll index FAQ documents from the Zoomcamp courses, retrieve relevant entries, and use the OpenAI API to generate answers based on the retrieved context.
What you’ll learn:
What LLMs are and how they are used in question-answering systems
What Retrieval-Augmented Generation is and why it’s useful
How a basic RAG architecture works
How to prepare a Python environment for an LLM application
How to index FAQ documents from Zoomcamp courses
How to implement keyword search with MinSearch
How to build prompts with retrieved context
How to generate answers with the OpenAI API
How to refactor the RAG pipeline into modular code
How to replace MinSearch with Elasticsearch for a more realistic retrieval setup
How to run Elasticsearch with Docker and search indexed documents
By the end, you’ll have a working RAG pipeline that answers questions using FAQ documents from Data Engineering Zoomcamp, Machine Learning Zoomcamp, and MLOps Zoomcamp.
Like the other workshops, this will be a live demo with practical tips and time for Q&A.
All events in these series:
Vector Databases: Embeddings, Semantic Search, and Hybrid Retrieval
RAG and Agents Evaluation: Measuring Retrieval and LLM Answer Quality
Monitoring LLM Applications: Traces, Feedback, and Production Quality
Thinking about Joining LLM Zoomcamp?
This workshop covers the updated content for Module 1 of the LLM Zoomcamp, our free course on building practical LLM applications with RAG, vector search, evaluation, monitoring, and AI agents.
You start with a simple RAG pipeline, then improve it with better retrieval, semantic search, function calling, evaluation, monitoring, and production practices.
The course covers the full lifecycle of an LLM application: from the first working prototype to evaluation, monitoring, and a complete final project.
The new cohort of LLM Zoomcamp starts on June 8, 2026. You can join it by registering here.
About the Speaker
Alexey Grigorev is the Founder of DataTalks.Club and creator of the Zoomcamp series.
Alexey is a software and ML engineer with over 10 years in engineering and 6+ years in machine learning. He has deployed large-scale ML systems at companies like OLX Group and Simplaex, authored several technical books, including Machine Learning Bookcamp, and is a Kaggle Master with a 1st place finish in the NIPS’17 Criteo Challenge.
DataTalks.Club is the place to talk about data. Join our Slack community!