The evals and observability platform for building reliable AI agents.

Braintrust

This session will break down the basics through practical examples that are suitable for both AI engineers and PMs. 

We'll work through how to evaluate three common use cases: 

We'll also cover how to write good scoring functions and manage datasets. No prior evaluation experience is required. Framework and model-agnostic approaches that work with any AI application. 

Measure what matters: Intro to AI evals for common use cases