Cover Image for Investigating RL for Interpretability with Caleb Biddulph (MATS, Google Gemini)
Cover Image for Investigating RL for Interpretability with Caleb Biddulph (MATS, Google Gemini)
Avatar for Cornell AI Alignment
A community of students and researchers conducting research and outreach to mitigate risks from advanced AI systems.
Hosted By
18 Went

Investigating RL for Interpretability with Caleb Biddulph (MATS, Google Gemini)

Registration
Past Event
Welcome! To join the event, please register below.
About Event

We’re thrilled to host Caleb Biddulph, a MATS 8.1 scholar (mentored by Micah Carroll at OpenAI), former software engineer at Google Gemini, and founding president of Cornell Effective Altruism.

Attendees will receive copies of MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking, a paper co-authored by Caleb. Following that, we’ll dive into a discussion about his current research on RL-driven prompt discovery and interpretability, as well as his insights on doing good technical research and pursuing careers in AI safety and alignment.

Catered dinner will be provided—come for the food and great conversation!

Location
Bowers Hall, Room 250 (new CIS building)
Avatar for Cornell AI Alignment
A community of students and researchers conducting research and outreach to mitigate risks from advanced AI systems.
Hosted By
18 Went