90/30 Club (ML reading) #42: Self-Distillation Enables Continual Learning

Name: 90/30 Club (ML reading) #42: Self-Distillation Enables Continual Learning
Start: 2026-03-02T19:00:00.000-08:00
End: 2026-03-02T21:30:00.000-08:00
Location: San Francisco, California

90/30 Club

Register to See Address

San Francisco, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Week 42: Self-Distillation Enables Continual Learning

Paper Link

This paper introduces Self-Distillation Fine-Tuning (SDFT), a new framework for enabling foundation models to continuously acquire new skills and knowledge without forgetting previously learned capabilities. The authors argue that most current post-training methods, especially supervised fine-tuning (SFT), are inherently off-policy and therefore cause catastrophic forgetting when models are sequentially adapted to new tasks. SDFT addresses this by converting demonstration-based learning into an on-policy training process using self-distillation.
The key idea is to have the model act as both a teacher and student simultaneously. The teacher is the model conditioned on expert demonstrations, while the student is the same model without demonstrations. By training the student to match the teacher’s behavior on trajectories generated by the student itself, SDFT produces on-policy updates that preserve prior capabilities while enabling skill and knowledge acquisition. Empirically, the authors show that SDFT consistently outperforms standard SFT, achieving higher new-task accuracy, stronger generalization, and dramatically reduced catastrophic forgetting across multi-task sequential learning benchmarks.
This work provides a new perspective on post-training and continual learning, showing how in-context learning can act as an implicit reward signal, bridging imitation learning, reinforcement learning, and distillation into a unified framework for self-improving foundation models.

Join us at Mox to explore:

- Are anti distillation TOC at odds with AGI?
- Why on-policy learning dramatically reduces catastrophic forgetting

- How in-context learning can act as an implicit reward function

- When SDFT outperforms SFT, RLHF-style pipelines, and continual pre-training

🔎Analyzed Papers

Discussion at 20:00, (optional) quiet reading from 19:00.

Location

Please register to see the exact location of this event.

San Francisco, California

Presented by

90/30 Club

Ilya Sutskever: “If you really learn all of these, you’ll know 90% of what matters today." Reading https://aman.ai/primers/ai/top-30-papers/ week by week!

Hosted By

72 Went

AI