Cover Image for Anti-Scheming: how would we train a model not to scheme? (Bronson Schoen and Marius Hobbhahn, Apollo Research)
Cover Image for Anti-Scheming: how would we train a model not to scheme? (Bronson Schoen and Marius Hobbhahn, Apollo Research)
Hosted By
91 Going

Anti-Scheming: how would we train a model not to scheme? (Bronson Schoen and Marius Hobbhahn, Apollo Research)

Registration
Welcome! To join the event, please register below.
About Event

This is the fifth event of UK AI Forum's Artificial Agency speaker series, and our keynote speaker event of the season.

Bronson Schoen (senior research engineer, Apollo Research) and Marius Hobbhahn (CEO, Apollo Research) will present Apollo’s & OpenAI’s joint work: ‘Stress Testing Deliberative Alignment for Anti-Scheming Training’. They attempted to use deliberative alignment to teach o3 and o4-mini general principles to not be deceptive. 

Bronson and Marius will talk about their findings, including: a) how well it works, b) effects on the situational awareness of the models, c) that the chain-of-thought of o3 is not always human-interpretable anymore, and more.

Location
Department of Engineering
Engineering Dept, Trumpington St, Cambridge CB2 1PZ, UK
The talk will be held in the Constance Tipper Lecture Theatre.
Hosted By
91 Going