

Week 45: Attention Residuals: Rethinking Information Flow in LLMs
This paper introduces Attention Residuals (AttnRes), a novel architectural mechanism by the Kimi Team designed to rethink how information flows in modern Large Language Models (LLMs). The central challenge addressed in the work is that standard residual connections with PreNorm accumulate all layer outputs using fixed unit weights. This uniform aggregation causes uncontrolled hidden-state growth with depth, progressively diluting each layer's unique contribution. To overcome this limitation, the authors replace this fixed accumulation with a softmax attention mechanism over preceding layer outputs, allowing each layer to selectively aggregate earlier representations using learned, input-dependent weights.
Join us at Mox!
🔎Analyzed Papers Discussion at 20:00, (optional) quiet reading from 19:00 to 20:00.