

Behind the Benchmarks: Building Agent Leaderboard 2.0
Join the Galileo team for an exclusive livestream walkthrough of the latest updates to the Galileo Agent Leaderboard—the benchmark for evaluating the real-world performance of AI agents.
We’ll unpack what’s changed (and why it matters), spotlight emerging leaders across evaluation categories like planning, tool use, and multi-turn reasoning, and reveal key insights.
This session is perfect for:
AI engineers working with agentic systems
Builders evaluating complex multi-step agents
Researchers curious about the edge of capability evaluations
Teams interested in how self-reflection and domain adherence are scored at scale
🔍 What you'll learn:
How Galileo measures agent reliability in the wild
Behind-the-scenes updates
Surprising model performance shifts and new failure modes
What's next for the leaderboard
How to make the most of this resource.
🎤 Feat: Creator of the Leaderboard, Pratik Bhavasar + Sr. Developer Experience Engineer, Erin Mikail Staples.
💬 Live Q&A and leaderboard AMA at the end.