Cover Image for Reading Group (+🧋): Agents' Last Exam
Cover Image for Reading Group (+🧋): Agents' Last Exam
Avatar for Snorkel AI Community Events

Reading Group (+🧋): Agents' Last Exam

Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

​Join the Snorkel AI Reading Group, a recurring forum to explore the latest frontier developments in AI while building meaningful connections within the community.

In this afternoon session, Yiyou Sun and Xinyang Han, Postdoctoral Researchers at UC Berkeley, will cover their recent paper: Agents' Last Exam.

​Agenda:

​4 pm - doors open
4:30 pm - talk begins

​🧋🧋🧋 Boba tea and other refreshments will be provided ! 🧋🧋🧋

​Among other things, you'll learn:

  • ​ALE is a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes—developed in collaboration with 250+ industry experts and covering 1,000+ tasks across 55 subfields in 13 industry clusters.

  • ​Widely-used benchmarks lack sustained performance measurement on real, economically valuable workflows, creating a systematic gap between benchmark success and meaningful deployment across professional domains.

  • ​ALE grounds task coverage in O*NET / SOC 2018, the U.S. federal occupational taxonomy, ensuring systematic, reproducible coverage of non-physical job categories at scale.

  • ​The hardest task tier remains far from saturated—across mainstream harness and backbone configurations, the average full pass rate is just 2.6%, underscoring the substantial headroom that remains.

  • ​ALE's task pool grows continuously as new workflows and industries are onboarded, enabling longitudinal tracking of agent capabilities rather than one-time snapshot comparisons.

  • ​ALE is intended not merely as another leaderboard, but as an instrument for closing the gap between benchmark performance and GDP-relevant economic impact.

​Agents' Last Exam is a collaboration between UC Berkeley's RDI (Center for Responsible Decentralized Intelligence), Snorkel AI, and 250+ industry experts across academia and industry.

Location
101 Second Street
San Francisco, CA 94105, USA
Avatar for Snorkel AI Community Events