

90/30 Club (ML reading) #29: Nested Learning: The Illusion of Deep Learning Architectures
Week 29: Nested Learning: The Illusion of Deep Learning Architectures
The Paper Link Here
A recent study by Google Research introduces Nested Learning (NL): a new paradigm that redefines deep learning as a hierarchy of nested optimization problems rather than stacked layers. The authors argue that modern neural networks, including Transformers, effectively compress their own context flow, and that in-context learning emerges naturally from this internal compression process.Through this framework, common algorithms such as SGD and Adam are reinterpreted as associative memory modules that store and adapt to gradient information. Building on these insights, the paper presents HOPE, a self-referential model that learns to modify its own update rules using a continuum memory system inspired by human neuroplasticity.
Empirical results across language modeling and reasoning benchmarks show that HOPE outperforms recent architectures like DeltaNet and Titans, particularly in long-context reasoning and continual learning. These findings suggest a new direction for AI model design, toward systems that learn across multiple timescales and self-optimize beyond traditional backpropagation.
Join us at Mox to explore:
- How might nested optimization structures allow models to generalize more efficiently by reusing global representations while adapting locally to new tasks?
- Could the hierarchical nature of nested learning make large models inherently more resilient to poisoning or backdoor attempts by isolating adversarial influence within inner-loop adaptations?
Discussion at 20:00, (optional) quiet reading from 19:00.