🦄 ai that works: Evals Revisited!
A critical piece of building AI into a software factory is knowing whether it's actually working and where it's failing. This week, we dig into the practical side of designing evaluations for AI systems embedded in software development pipelines. We'll cover how to define what "good" looks like when AI is writing code, reviewing PRs, or generating tests, and how to build evals that are repeatable, automated, and meaningful at scale.
Meet the Speakers🧑💻
Meet Vaibhav Gupta, one of the creators of BAML and YC alum. He spent 10 years in AI performance optimization at places like Google, Microsoft, and D. E. Shaw. He loves diving deep and chatting about anything related to Gen AI and Computer Vision!
Meet Dex Horthy, founder at HumanLayer and coiner of the term Context Engineering. He spent 10+ years building devops tools at Replicated, Sprout Social and JPL. DevOps junkie turned AI Engineer.