F2025 - Technical Paper Reading Group Week 7 - Unlearning
UBC AI Safety Technical Paper Reading Group
UBC AI Safety has launched a biweekly technical paper reading group focused on cutting-edge AI safety research.
Sessions will engage with recent papers across topics including mechanistic interpretability, AI control, scalable oversight, capability evaluation, and failure mode identification. The group emphasizes critical analysis and discussion.
Session 7: Unlearning
Unlearning techniques try to selectively remove information from a trained model. This includes gradient-based forgetting, representation editing techniques like ROME, and various fine-tuning approaches. The main open question is whether these interventions genuinely delete knowledge or just make it harder to access. In this session, we'll critically examine these methods and the evidence for and against their robustness. Dinner will be provided.
Prereading:
No prereading this week!
Who Should Attend:
Meetings are open to anyone interested in technical AI safety research. While no prior experience is required, participants with working knowledge of AI Safety and machine learning concepts will get the most out of discussions. If you're unsure whether you have sufficient background, check out this preparation document which gives resources on topics you should be familiar with for maximum engagement with the material.