

MLn Club (ML Reading Group) #7: In-context Learning and Induction Heads
Week 7: In-context Learning and Induction Heads
When does a model suddenly learn to learn — and can you spot the moment on the loss curve?
If a circuit is defined to do nothing but copy random text, why does the same circuit also translate French?
In-context Learning and Induction Heads Olsson, Elhage, Nanda, et al. (Anthropic)
This blog post suggests that induction heads might constitute the mechanism for the actual majority of all in-context learning in large transformer models.
These induction heads "complete the pattern" by copying and completing sequences that have occurred before.
The majority of in-context learning ability (as measured by difference in loss between tokens early and late in the sequence) is acquired, and simultaneously induction heads form within the model that are capable of implementing fairly abstract and fuzzy versions of pattern completion.
Together the claims establish a circumstantial case that induction heads might be responsible for the majority of in-context learning in state-of-the-art transformer models.
Join us at CASI for discussion at 8 pm, and (optional) quiet reading from 7 pm.
Reading Recommendations, Questions, or Comments? Contact us here! View past meeting notes here.