Beyond vibes: Measuring your agent with evals

Mastra

Virtual

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Relying on “vibes” to tell if your agent works doesn’t scale—especially once you’re iterating on prompts, models, tools, and RAG. In this workshop, you’ll learn how to turn subjective spot-checking into a clear, repeatable signal for real agent quality using evals (Mastra scorers).

We’ll break down what evals are (and why they’re different from traditional pass/fail tests), then walk through concrete, production-shaped examples. You’ll see practical patterns for getting started, including LLM-as-a-judge, plus how to run live evaluations with sampling so you can monitor quality without slowing down your app.

Then we’ll connect evals to the workflow that actually makes them useful: datasets and experiments. You’ll learn how to build a versioned set of test cases, run the same dataset against different agent versions, and compare results to catch regressions and measure improvements with confidence.

Expect a grounded, no-hype look at where automated evals are powerful, where they can mislead, and how to pair scores with lightweight analysis so you can keep improving your agent over time—with time for questions throughout and at the end.

Hosted by

Alex Booker, Developer Experience at Mastra
- x.com/bookercodes
- linkedin.com/in/bookercodes
Yujohn Nattrass, Software Engineer at Mastra
- x.com/YujohnNatt
- linkedin.com/in/yujohn-nattrass/

Recording and code examples will be available to everyone who registers.

Presented by

Mastra

The open-source TypeScript framework for building AI agents

Hosted By

18 Going

AI