

10 Things I Hate About Evals with Hamel Husain
Hamel Husain is one of the world’s foremost AI engineers, helping teams across the industry build and ship reliable AI systems. He also teaches the #1 AI Evals course in the world and has helped over 2,000 builders industry-wide level up their evals skills for production use cases.
But in this livestream, we’re going to flip the script: digging into the things he hates about evals—and what that says about how to actually make them work.
Evals are supposed to keep us honest about whether AI systems are working. Too often, though, they mislead us, distract us, or generate noise instead of clarity.
In this session, Hamel and I will run through the biggest mistakes he sees teams make when it comes to evals—from relying on generic, off-the-shelf metrics to automating away the wrong things. Along the way, we’ll talk about better ways to approach evaluation, how to involve the right expertise, and what it takes to align evals with real product outcomes.
What we’ll discuss
Why generic metrics often miss the mark
The dangers of dashboards and automation that hide real errors
When prompts, data, and domain expertise matter more than more tools
Why using LLMs as judges can backfire without human grounding
Building intuition about what AI is (and isn’t) good at
Register to join live, bring your own “eval hates,” and learn how to design evals that actually improve AI systems.
If you can't make it, register and we'll send you the recording.