WORLD MODELS — A Paper Workshop
WORLD MODELS, A Workshop
World models are becoming one of AI’s central bets, but the field still cannot agree on what a world model should actually be.
The session will focus on a tension that now feels less theoretical and more practical:
Should world models learn abstract latent dynamics, or should they stay grounded in pixels, video, and future visual prediction?
At first, avoiding pixels sounds elegant. JEPA-style models argue that intelligence should emerge from predicting compact latent states, not reconstructing every visual detail.
But recent video-based approaches make the opposite intuition harder to dismiss. Models such as DreamZero, DreamDojo, Cosmos-style policies, and DiT4DiT suggest that video priors may carry dense information about motion, contact, occlusion, and physical change — exactly the kind of structure a robot needs.
Then there are approaches like LeWM, which push in another direction: compact latent dynamics from pixels, without a teacher EMA or explicit reconstruction.
So the real question is not simply pixel versus latent space.
It is what kind of representation actually buys control.
World Action Models bring this debate into robotics: prediction is useful, but prediction alone is not control. A robot needs action grounding, closed-loop correction, and representations that survive contact with the physical world.
That is the gap where many “world models” quietly stop being more than a generative prior with better branding.
We’ll close with an open technical discussion around what holds up outside toy settings, what breaks in real robotic control, and which directions still look genuinely promising.
Date
Tuesday, June 23
Time
19:30 CEST
Location
Paris or Online
Pizzas and drinks ?
Of course.
