

Eval Engineering for AI Developers - Lesson 4: Build custom metrics
Learn Eval Engineering in this free, 5-part, hands-on course.
90% of AI agents don't make it successfully to production. The biggest reason is the AI engineers building these apps don't have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them.
In this course, you will learn all about evals for AI applications. You'll start with some out-of-the-box metrics and learn about evals, then move onto understanding observability for AI apps, analyzing failure states, defining custom metrics, then finally using these across your whole SDLC.
This will be hands on, so be prepared to write some code, create some metrics, and do some homework!
In this fourth lesson, you will
Build datasets of known inputs and outputs for cases that pass and fail
Learn how to build custom metrics for your failure cases
Determine the success of your metrics by measuring true and false positives and negatives
Prerequisites:
A basic knowledge of Python
Access to an OpenAI API key
A free Galileo account (we will be using Galileo as the evals platform)
Future lessons
Lesson 5: https://luma.com/esoi6izo