

Presented by
Cornell AI Alignment
A community of students and researchers conducting research and outreach to mitigate risks from advanced AI systems.
Hosted By
18 Went
Investigating RL for Interpretability with Caleb Biddulph (MATS, Google Gemini)
Registration
Past Event
About Event
We’re thrilled to host Caleb Biddulph, a MATS 8.1 scholar (mentored by Micah Carroll at OpenAI), former software engineer at Google Gemini, and founding president of Cornell Effective Altruism.
Attendees will receive copies of MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking, a paper co-authored by Caleb. Following that, we’ll dive into a discussion about his current research on RL-driven prompt discovery and interpretability, as well as his insights on doing good technical research and pursuing careers in AI safety and alignment.
Catered dinner will be provided—come for the food and great conversation!
Location
Bowers Hall, Room 250 (new CIS building)
Presented by
Cornell AI Alignment
A community of students and researchers conducting research and outreach to mitigate risks from advanced AI systems.
Hosted By
18 Went