

F2025 - Technical Paper Reading Group Week 6 - Control
UBC AI Safety Technical Paper Reading Group
UBC AI Safety has launched a biweekly technical paper reading group focused on cutting-edge AI safety research.
Sessions will engage with recent papers across topics including mechanistic interpretability, AI control, scalable oversight, capability evaluation, and failure mode identification. The group emphasizes critical analysis and discussion.
Session 6: Control
Location: IKB 265
Session Plan
AI Control describes a broad class of research directions aimed at giving us safety guarantees even when deploying models with potentially dangerous capabilities. The frame differs from alignment work: rather than making models want the right things, control tries to prevent unaligned models from successfully executing harmful plans. We'll examine proposed protocols and ask what they can and can't guarantee. Dinner will be provided!
Prereading:
This primer is a quick read and covers everything you need to know for this session. We highly recommend reading this before attending! You can browse through these posts if you want to explore further.
Who Should Attend:
Meetings are open to anyone interested in technical AI safety research. While no prior experience is required, participants with working knowledge of AI Safety and machine learning concepts will get the most out of discussions. If you're unsure whether you have sufficient background, check out this preparation document which gives resources on topics you should be familiar with for maximum engagement with the material.