Cover Image for Beyond vibes: Measuring your agent with evals
Cover Image for Beyond vibes: Measuring your agent with evals
Avatar for Mastra
Presented by
Mastra
The open-source TypeScript framework for building AI agents
18 Going

Beyond vibes: Measuring your agent with evals

Virtual
Registration
Welcome! To join the event, please register below.
About Event

Relying on “vibes” to tell if your agent works doesn’t scale—especially once you’re iterating on prompts, models, tools, and RAG. In this workshop, you’ll learn how to turn subjective spot-checking into a clear, repeatable signal for real agent quality using evals (Mastra scorers).

We’ll break down what evals are (and why they’re different from traditional pass/fail tests), then walk through concrete, production-shaped examples. You’ll see practical patterns for getting started, including LLM-as-a-judge, plus how to run live evaluations with sampling so you can monitor quality without slowing down your app.

Then we’ll connect evals to the workflow that actually makes them useful: datasets and experiments. You’ll learn how to build a versioned set of test cases, run the same dataset against different agent versions, and compare results to catch regressions and measure improvements with confidence.

Expect a grounded, no-hype look at where automated evals are powerful, where they can mislead, and how to pair scores with lightweight analysis so you can keep improving your agent over time—with time for questions throughout and at the end.

Hosted by

Recording and code examples will be available to everyone who registers.

Avatar for Mastra
Presented by
Mastra
The open-source TypeScript framework for building AI agents
18 Going