Braintrust is the AI observability platform for shipping quality AI products.

Braintrust

Production traces capture where your AI falls short and what users are trying to do. Building evals from that data is how you catch failures earlier and make better calls about what ships next.

In this session, Amanda Gilbert shows how to take the patterns Braintrust surfaces automatically, turn them into a labeled eval dataset, and run the same workflow every time a new pattern shows up.

Online workshop: Build evals from real production data

Omar Olivares Urrutia

Aryan Singhal

Meredith Wade

Kirill Sofronov

Nicole Kim

Adwait Joshi

Robert Amanfu

Anthony Chen

Ashka Stephen

Viya

Standard