

AI Reading Club - Where models store factual and linguistic knowledge
Transformer Feed-Forward Layers Are Key-Value Memories (2020)
Paper: https://arxiv.org/abs/2012.14913
After several sessions focused on attention, this paper shifts attention to another major part of the Transformer: the feed-forward layers. The authors show that these layers can be interpreted as key-value memories, where one part detects meaningful input patterns and another contributes associated output information.
We will discuss what this means for understanding where models store factual and linguistic knowledge, why attention alone is not enough to explain Transformer behaviour, and how this paper changed the direction of interpretability work.
Session format
10-15 minute overview by the discussion lead
About 45 minutes of group discussion
Discussion lead: TBD
Discussion prompts
If feed-forward layers store knowledge, what changes about how we think models remember facts?
How convincing is the key-value memory interpretation, and where might it break down?
What does this paper add after reading BERT attention and “Attention is not Explanation”?
Join
Discord: https://discord.gg/5rAMsuVXXp