

Fresh Context Chai: The Domain Expert Eval Problem
Registration
Approval Required
Your registration is subject to host approval.
About Event
Shipped an AI agent used by non-devs? Now how do we tell if it is actually working?
The only people who can really judge the outputs are experts - lawyers, doctors, or underwriters, who are often slow & always expensive.
So we find hacks, build something to make things work. An LLM grading another LLM, a spreadsheet of test cases an in-house expert updates when they have time.
But how and when to trust it?
Let's find some answers. Small group of founding engineers working through this. Chatham House rules.
Hosted by Exosphere
A new series by serial hosts: