🦄 ai that works: Evals Revisited!

Boundary

Virtual

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

A critical piece of building AI into a software factory is knowing whether it's actually working and where it's failing. This week, we dig into the practical side of designing evaluations for AI systems embedded in software development pipelines. We'll cover how to define what "good" looks like when AI is writing code, reviewing PRs, or generating tests, and how to build evals that are repeatable, automated, and meaningful at scale.

Meet the Speakers🧑‍💻

Meet Vaibhav Gupta, one of the creators of BAML and YC alum. He spent 10 years in AI performance optimization at places like Google, Microsoft, and D. E. Shaw. He loves diving deep and chatting about anything related to Gen AI and Computer Vision!

Meet Dex Horthy, founder at HumanLayer and coiner of the term Context Engineering. He spent 10+ years building devops tools at Replicated, Sprout Social and JPL. DevOps junkie turned AI Engineer.

Presented by

Boundary

We make BAML, a programming language for using LLMs. Some event recordings are available here: https://github.com/hellovai/ai-that-works

Hosted By

87 Going

AI