Cover Image for Measuring What Works: Agent Evals, Context Quality, and Optimization
Cover Image for Measuring What Works: Agent Evals, Context Quality, and Optimization
Avatar for AI Native Dev
Presented by
AI Native Dev
Hosted By

Measuring What Works: Agent Evals, Context Quality, and Optimization

Virtual
Registration
Welcome! To join the event, please register below.
About Event

If you can’t measure it, you can’t improve it, especially with agents.

Most teams rely on vibes, anecdotes, or raw model benchmarks to judge agent performance. That breaks down fast in real developer workflows.

This session goes deep on evaluation and optimization. We’ll show how to define meaningful grading criteria and measure what actually improves agent outcomes in production.

You’ll learn how to evaluate agent performance, quantify the impact of different context packages, and turn failures into a continuous improvement loop.

Expect a practical view of what “agent performance” really means.

What you’ll learn

  • How to design realistic, repeatable agent evaluation tasks

  • Grading criteria that reflect real developer success

  • Ways to measure the impact of docs, rules, and examples on outcomes

  • Turning production failures into feedback that improves context over time

Speakers

Dru Knox

Head of Product, Tessl

Dru leads Product at Tessl. He brings deep experience in platform and ecosystem development, having helped build two of the largest developer platforms in the world, Android and the web. His work sits at the intersection of product design, developer experience, and systems thinking. Outside of work, he’s drawn to design, game theory, and a bit of armchair philosophy.


Maria Gorinova

Member of Technical Staff, Tessl

Maria is an AI Research Engineer at Tessl. Her experience spans machine learning and computer science, including probabilistic programming, variational inference, graph neural networks, geometric deep learning, programming language design, and program analysis, with applications across science, healthcare, and social media.

Who this is for

Engineers, researchers, platform teams, and technical leaders who want evidence-based answers to what actually makes agents better.

Avatar for AI Native Dev
Presented by
AI Native Dev
Hosted By