Measuring What Works: Agent Evals, Context Quality, and Optimization

AI Native Dev

Virtual

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

If you can’t measure it, you can’t improve it, especially with agents.

Most teams rely on vibes, anecdotes, or raw model benchmarks to judge agent performance. That breaks down fast in real developer workflows.

This session goes deep on evaluation and optimization. We’ll show how to define meaningful grading criteria and measure what actually improves agent outcomes in production.

You’ll learn how to evaluate agent performance, quantify the impact of different context packages, and turn failures into a continuous improvement loop.

Expect a practical view of what “agent performance” really means.

What you’ll learn

How to design realistic, repeatable agent evaluation tasks
Grading criteria that reflect real developer success
Ways to measure the impact of docs, rules, and examples on outcomes
Turning production failures into feedback that improves context over time

Speakers

Dru Knox

Head of Product, Tessl

Dru leads Product at Tessl. He brings deep experience in platform and ecosystem development, having helped build two of the largest developer platforms in the world, Android and the web. His work sits at the intersection of product design, developer experience, and systems thinking. Outside of work, he’s drawn to design, game theory, and a bit of armchair philosophy.

Maria Gorinova

Member of Technical Staff, Tessl

Maria is an AI Research Engineer at Tessl. Her experience spans machine learning and computer science, including probabilistic programming, variational inference, graph neural networks, geometric deep learning, programming language design, and program analysis, with applications across science, healthcare, and social media.

Who this is for

Engineers, researchers, platform teams, and technical leaders who want evidence-based answers to what actually makes agents better.

Presented by

AI Native Dev

Hosted By

Measuring What Works: Agent Evals, Context Quality, and Optimization

​What you’ll learn

​Speakers

​Who this is for

What you’ll learn

Speakers

Who this is for