

Beyond vibes: Measuring your agent with evals
Relying on “vibes” to see if your agent works doesn’t scale. In this session, you’ll learn how to build a clear, repeatable signal for how your agent really performs - using evals.
Hosted by:
- Alex Booker (Developer Educator @ Mastra)
- Yujohn Nattrass (Software Engineer, working on Evals @ Mastra)
We’ll break down what evals are, how they differ from traditional tests, and how they show up in real systems with concrete examples and demos.
You’ll see practical patterns for getting started - including LLM-as-a-Judge - and how to use evals to reliably iterate on agents powered by techniques like RAG.
We’ll also go deeper into why rigorous evaluation matters - where built-in evals are powerful, where they break down or mislead, and how to pair them with data analysis and a technique called open coding to truly understand and improve your agent’s behaviour over time.
Expect a grounded, no-hype look at automated evals designed for people who ship, with time for questions throughout and at the end.
Recording and code examples will be available to everyone who registers.