CAIA Speaker Event: Gabriel Wu, OpenAI (Virtual)

Name: CAIA Speaker Event: Gabriel Wu, OpenAI (Virtual)
Start: 2026-02-27T17:00:00.000-08:00
End: 2026-02-27T18:00:00.000-08:00
Location: Tianqiao and Chrissy Chen Neuroscience Research building

Hosted by Ayushi Mehrotra

Tianqiao and Chrissy Chen Neuroscience Research building

Pasadena, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Who: Gabriel Wu, OpenAI (Virtual)
When: February 27, 5–6 pm PT
Where: Chen 100, Caltech

Title: Teaching LLMs to Confess

Abstract: We train GPT-5 to self-report misbehavior by producing an auxiliary “confession message” that receives an independent reward during RL. We find that models are typically honest in their confessions, and this honesty increases with training. We will also discuss connections between our approach and standard chain-of-thought monitoring, and whether we expect confessions to work on more egregiously misaligned models.

Bio: Gabriel Wu is a researcher on the Alignment team at OpenAI where he works on training models to more reliably follow human instructions. Previously, he worked at the Alignment Research Center and led the AI Safety Student Team at Harvard.

Everyone is welcome: no specific technical background is required. Come learn and ask questions. And yes, we will have pizza and boba.

Location

Tianqiao and Chrissy Chen Neuroscience Research building

S Wilson Ave &, E Del Mar Blvd, Pasadena, CA 91106, USA

Hosted By

193 Went

AI