Cover Image for Reading Group (+🧋): Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration

Presented by

Snorkel AI (snorkel.ai) develops the datasets, benchmarks, and evaluation methods that help AI and agentic systems learn, adapt, and perform in the real world.

Hosted By

6 Going

AI

Reading Group (+🧋): Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration

Name: Reading Group (+🧋): Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Start: 2026-06-03T15:00:00.000-07:00
End: 2026-06-03T17:30:00.000-07:00
Location: 101 Second Street

Snorkel AI Community Events

101 Second Street

San Francisco, CA

Approval Required

Your registration is subject to host approval.

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Join the Snorkel AI Reading Group, a recurring forum to explore the latest frontier developments in AI while building meaningful connections within the community.

In this afternoon session, Yijia Shao, a PhD candidate at Stanford NLP, will cover work she recently presented at ICLR: Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration.

Agenda:

3pm - doors open
3:30pm - talk begins

🧋🧋🧋 Boba tea and other refreshments will be provided ! 🧋🧋🧋

Among other things, you'll learn:

How Co-Gym formalizes human–agent collaboration as a POMDP with dual control over a shared workspace.
Why enforced turn-taking breaks down for real tasks, and what replaces it: two collaboration acts plus a notification protocol.
How an evaluation suite scores both outcomes and process, including an entropy-based measure of shared initiative.
Why a Collaborative Agent with Situational Planning beats fully autonomous baselines across three tasks.
How real users rate the best collaborative agent: 86% 74% 66% win rates over autonomous on Travel, Tabular, and Related Work.
Why communication and situational awareness remain the dominant failure modes (65% and 40% of real trajectories).

Location