World Models for AI Agents
Dr Uday Kamath is giving a guest lecture for our course on Agentic AI - more details about the course here: beyondvectors.com
Abstract
Inside the Dream: How an Agent Learns to Act in a World It Has Only Imagined
Most reinforcement learning agents become competent by repeatedly bumping into reality. They learn from millions of real interactions, each one expensive. Yet humans learn many skills in very different ways. A baseball batter has milliseconds to swing, far less than the time it takes for visual signals to reach the brain, so the batter must be predicting rather than reacting. Skilled people make fast decisions by running an internal model of the world, and the muscles act on the prediction.
In 2018, Ha and Schmidhuber published "World Models," a paper demonstrating that an artificial agent can be given exactly this capacity. The agent watches a small amount of random play, compresses what it sees into a learned latent space, builds a probabilistic model of how that space evolves under its actions, and then learns its policy entirely inside the resulting simulator. The real environment is touched only at the end, for evaluation.
This talk walks through the recipe in detail. We will start from the conceptual motivation: why model-based reinforcement learning matters, and what it costs to do RL the model-free way. We will then unpack the three components of the World Models architecture: a variational autoencoder for vision, a mixture-density recurrent network for dynamics, and a small policy trained by gradient-based or evolution-based optimization against the dream. Each component will be illustrated with a working reproduction on a small custom environment, so the architectural choices are visible at a scale that fits in a single GPU hour. We will discuss the failure modes that motivated the original design choices, including why a single Gaussian dynamics model is insufficient and how an agent within a dream can learn to exploit its own simulator. The final third of the talk surveys where the field has gone since: the Recurrent State-Space Model and the Dreamer family of methods, and what changes when world models scale to internet-scale video data. Students will leave with a complete mental model of the V/M/C decomposition, a concrete sense of how each piece is trained, and a working notebook they can run themselves.