Cover Image for AI Evals w/ Mike Merrill — Terminal Bench: A benchmark for AI agents in terminal environments
Cover Image for AI Evals w/ Mike Merrill — Terminal Bench: A benchmark for AI agents in terminal environments
162 Went

AI Evals w/ Mike Merrill — Terminal Bench: A benchmark for AI agents in terminal environments

Hosted by Vals AI & alphaXiv
Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

When: Thursday, Oct 9, 11 am PT

Where: Zoom link created by alphaXiv, later uploaded to alphaXiv Youtube

Zoom link: https://stanford.zoom.us/j/95904059062?pwd=0ErKmwUCab6qBSNls8oUhmeF1pzeIo.1&from=addon

About Event

​​​​​​🔬 AI Evals on alphaXiv
🗓 Thursday October 9th 2025 · 11AM PT
🎙 Featuring Mike Merrill
💬 Casual Talk + Open Discussion

Terminal-Bench: A benchmark for AI agents in terminal environments

We are excited to have Mike Merrill to discuss his work on Terminal Bench, a widely used benchmark for evaluating agents in terminal environments. He will also present his broader work and perspectives on evaluations. Mike is a Postdoctoral Researcher at Stanford Computer Science working with Ludwig Schmidt on empirical evaluations of reasoning LLMs.


Hosted by: alphaXiv x Vals AI

​​​​​​AI Evals: join the community

162 Went