Cover Image for Behind the Benchmarks: Building Agent Leaderboard 2.0
Cover Image for Behind the Benchmarks: Building Agent Leaderboard 2.0
Avatar for Galileo Events
Presented by
Galileo Events
Calendar of events for AI evaluation company Galileo
67 Went

Behind the Benchmarks: Building Agent Leaderboard 2.0

YouTube
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Join the Galileo team for an exclusive livestream walkthrough of the latest updates to the Galileo Agent Leaderboard—the benchmark for evaluating the real-world performance of AI agents.

We’ll unpack what’s changed (and why it matters), spotlight emerging leaders across evaluation categories like planning, tool use, and multi-turn reasoning, and reveal key insights.

This session is perfect for:

  • AI engineers working with agentic systems

  • Builders evaluating complex multi-step agents

  • Researchers curious about the edge of capability evaluations

  • Teams interested in how self-reflection and domain adherence are scored at scale

🔍 What you'll learn:

  • How Galileo measures agent reliability in the wild

  • Behind-the-scenes updates

  • Surprising model performance shifts and new failure modes

  • What's next for the leaderboard

  • How to make the most of this resource.

🎤 Feat: Creator of the Leaderboard, Pratik Bhavasar + Sr. Developer Experience Engineer, Erin Mikail Staples.

💬 Live Q&A and leaderboard AMA at the end.

Avatar for Galileo Events
Presented by
Galileo Events
Calendar of events for AI evaluation company Galileo
67 Went