Personal

alphaXiv

🗓 Thursday October 2nd 2025 · 11AM PT

 to discuss how AI evaluations need to change in tandem with LLM capabilities. He will present his work on generative, and long horizon evaluations in the era of reasoning agents, and perspectives on new capability evaluations we need towards generally intelligent agents. Shashwat is a PhD student co-advised by Jonas Geiping and Douwe Kiela through the ELLIS program at the Max Planck Institute for Intelligent Systems.

https://stanford.zoom.us/j/91936389736?pwd=GI5Kibcjl6UaN9IQOLShBthyaiOIbL.1&from=addon

AI Evals w/ Shashwat Goel: Measuring AI progress requires rethinking evaluations

William Chen

Abby Barnes

Valério Cardoso

Evans Ocansey

Raimundo Saona

Xiangyi Li

Nitin Pasumarthy

Jagadish Gandhi

Mindy

Wes Simpson

new-spirit

Vals AI

Standard