Evaluating and Improving Agent Search Loops
About Event
Most retrieval agents fail quietly. The agent pulls something plausible, generates an answer, and only later do you realize the context was incomplete or wrong.
In this hands-on workshop, you'll build and evaluate an agent that recognizes when its retrieval step isn't good enough. Starting from a prebuilt Qdrant-powered agent, you'll run evals, inspect traces, identify failure modes, and add in-loop checks that help the agent decide whether to rewrite the query, search again, or stop.
We'll cover practical retrieval-quality signals: low top-result confidence, tightly clustered scores, weak rank gaps, and disagreement between keyword and vector retrieval. Then we'll use those signals to improve the loop and measure whether the changes actually improve answer quality, retrieval quality, cost, and time-to-success.
No pre-work required. Every participant gets a ready-to-use VM with the environment, data, agent code, Qdrant setup, traces, and evaluation harness included.
AGENDA
5:00 – 5:30 PM | Arrival, Registration & Networking
Check-in, refreshments, and access to the demo environment
5:30 – 6:00 PM | Introduction, Opening
6:00 – 6:20 PM |
6:20 – 7:20 PM | Talk: TBC
7:20 – 7:50 PM | Talk: TBC
7:50 – 8:20 PM | Talk: TBC
8:20 – 8:50 PM | Wrap-Up & Networking
Key takeaways, Q&A, and networking
What You’ll Learn
How to evaluate retrieval agents beyond final answer correctness
How to inspect traces and identify retrieval-driven failures
How agents can use live retrieval signals to decide when to retry
How to detect weak retrieval using score confidence, variance, rank gaps, and retriever disagreement
How query rewriting can improve or harm retrieval
How to measure whether step N+1 actually beats step N
How to balance answer quality, latency, token cost, and tool-call cost
How Qdrant serves as the retrieval layer for adaptive agent systems
WHO SHOULD ATTEND
AI engineers and agent developers who want to move beyond "retrieve and generate" and learn how to evaluate and improve agent behavior step by step. Best fit for people who have built basic agent workflows and want stronger tools for debugging retrieval failures, query rewriting, and stopping decisions.
Seats are limited