Behind the Benchmarks: Building Agent Leaderboard 2.0

Galileo Events

YouTube

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Join the Galileo team for an exclusive livestream walkthrough of the latest updates to the Galileo Agent Leaderboard—the benchmark for evaluating the real-world performance of AI agents.

We’ll unpack what’s changed (and why it matters), spotlight emerging leaders across evaluation categories like planning, tool use, and multi-turn reasoning, and reveal key insights.

This session is perfect for:

AI engineers working with agentic systems
Builders evaluating complex multi-step agents
Researchers curious about the edge of capability evaluations
Teams interested in how self-reflection and domain adherence are scored at scale

🔍 What you'll learn:

How Galileo measures agent reliability in the wild
Behind-the-scenes updates
Surprising model performance shifts and new failure modes
What's next for the leaderboard
How to make the most of this resource.

🎤 Feat: Creator of the Leaderboard, Pratik Bhavasar + Sr. Developer Experience Engineer, Erin Mikail Staples.

💬 Live Q&A and leaderboard AMA at the end.

Presented by

Galileo Events

Calendar of events for AI evaluation company Galileo

Hosted By

67 Went

AI