Cover Image for Investigating RL for Interpretability with Caleb Biddulph (MATS, Google Gemini)

Presented by

Cornell AI Alignment

A community of students and researchers conducting research and outreach to mitigate risks from advanced AI systems.

Hosted By

18 Went

AI

Investigating RL for Interpretability with Caleb Biddulph (MATS, Google Gemini)

Name: Investigating RL for Interpretability with Caleb Biddulph (MATS, Google Gemini)
Start: 2025-11-05T17:00:00.000-05:00
End: 2025-11-05T18:30:00.000-05:00
Location: Bowers Hall, Room 250 (new CIS building)

Cornell AI Alignment

Bowers Hall, Room 250 (new CIS building)

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

We’re thrilled to host Caleb Biddulph, a MATS 8.1 scholar (mentored by Micah Carroll at OpenAI), former software engineer at Google Gemini, and founding president of Cornell Effective Altruism.

Attendees will receive copies of MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking, a paper co-authored by Caleb. Following that, we’ll dive into a discussion about his current research on RL-driven prompt discovery and interpretability, as well as his insights on doing good technical research and pursuing careers in AI safety and alignment.

Catered dinner will be provided—come for the food and great conversation!

Location

Bowers Hall, Room 250 (new CIS building)

Presented by

Cornell AI Alignment

A community of students and researchers conducting research and outreach to mitigate risks from advanced AI systems.

Hosted By

18 Went

AI