Cover Image for 🦄 ai that works: Evals Revisited!
Cover Image for 🦄 ai that works: Evals Revisited!
Avatar for Boundary
Presented by
Boundary
We make BAML, a programming language for using LLMs. Some event recordings are available here: https://github.com/hellovai/ai-that-works
Hosted By
87 Going

🦄 ai that works: Evals Revisited!

Virtual
Registration
Welcome! To join the event, please register below.
About Event

A critical piece of building AI into a software factory is knowing whether it's actually working and where it's failing. This week, we dig into the practical side of designing evaluations for AI systems embedded in software development pipelines. We'll cover how to define what "good" looks like when AI is writing code, reviewing PRs, or generating tests, and how to build evals that are repeatable, automated, and meaningful at scale.

Meet the Speakers🧑‍💻

Meet Vaibhav Gupta, one of the creators of BAML and YC alum. He spent 10 years in AI performance optimization at places like Google, Microsoft, and D. E. Shaw. He loves diving deep and chatting about anything related to Gen AI and Computer Vision!

Meet Dex Horthy, founder at HumanLayer and coiner of the term Context Engineering. He spent 10+ years building devops tools at Replicated, Sprout Social and JPL. DevOps junkie turned AI Engineer.

Avatar for Boundary
Presented by
Boundary
We make BAML, a programming language for using LLMs. Some event recordings are available here: https://github.com/hellovai/ai-that-works
Hosted By
87 Going