Cover Image for RAG and Agents Evaluation: Measuring Retrieval and LLM Answer Quality
Cover Image for RAG and Agents Evaluation: Measuring Retrieval and LLM Answer Quality
Avatar for DataTalks.Club events
DataTalks.Club is a global online community of people who love data.
22 Going

RAG and Agents Evaluation: Measuring Retrieval and LLM Answer Quality

YouTube
Registration
Welcome! To join the event, please register below.
About Event

This is the 4th workshop in our series to update the LLM Zoomcamp content.

This workshop updates Module 4: Evaluation.

In this hands-on session, Alexey Grigorev will show how to evaluate retrieval and answer quality in a RAG application.

You’ll learn how to create ground truth data, evaluate search results, compare generated answers, and use both embedding-based metrics and LLM-as-a-Judge for offline evaluation.

What you’ll learn:

  • Why evaluation is important for LLM applications

  • What can go wrong in RAG systems without systematic evaluation

  • How to create ground truth data for retrieval evaluation

  • How to use an LLM to generate evaluation data

  • How to evaluate text search results

  • How ranking metrics work for retrieval evaluation

  • How to compare offline and online evaluation

  • How to generate data for offline RAG evaluation

  • How to use embeddings and cosine similarity to compare answers

  • How to compare answers from different models

  • How to use LLM-as-a-Judge for answer evaluation

  • How to evaluate answers with A→Q→A’ and Q→A approaches

By the end, you’ll understand how to measure the quality of a RAG system instead of relying only on manual testing. You’ll have notebooks and datasets for evaluating both retrieval and generated answers.

Like the other workshops, this will be a live demo with practical tips and time for Q&A.


All events in these series:

  1. Build Your First RAG Application with LLMs

  2. From RAG to AI Agents: Function Calling and Tool Use

  3. Vector Databases: Embeddings, Semantic Search, and Hybrid Retrieval

  4. RAG and Agents Evaluation: Measuring Retrieval and LLM Answer Quality

  5. Monitoring LLM Applications: Traces, Feedback, and Production Quality


Thinking about Joining LLM Zoomcamp?

This workshop covers the updated content for Module 4 of the LLM Zoomcamp, our free course on building practical LLM applications with RAG, vector search, evaluation, monitoring, and AI agents.

You start with a simple RAG pipeline, then improve it with better retrieval, semantic search, function calling, evaluation, monitoring, and production practices.

The course covers the full lifecycle of an LLM application: from the first working prototype to evaluation, monitoring, and a complete final project.

The new cohort of LLM Zoomcamp starts on June 8, 2026. You can join it by registering here.

About the Speaker

Alexey Grigorev is the Founder of DataTalks.Club and creator of the Zoomcamp series.

Alexey is a software and ML engineer with over 10 years in engineering and 6+ years in machine learning. He has deployed large-scale ML systems at companies like OLX Group and Simplaex, authored several technical books, including Machine Learning Bookcamp, and is a Kaggle Master with a 1st place finish in the NIPS’17 Criteo Challenge.​

DataTalks.Club is the place to talk about data. Join our Slack community!

Avatar for DataTalks.Club events
DataTalks.Club is a global online community of people who love data.
22 Going