Registration
Past Event
Welcome! To join the event, please register below.
About Event

Join us on Feb 11th for the third edition of the Unaite Paper Club featuring Quentin, research scientist at Meta Fair !

This talk explores recent developments in World Models and the learning of expressive and efficient latent spaces from video.

We will begin by discussing V-JEPA 2[1], a state-of-the-art video encoder trained with self-supervised learning by predicting missing parts of videos in latent space. We will then study the model's understanding of intuitive physical knowledge when predicting the future[2].

Finally, we will discuss how we can learn a world model that predicts physical actions, from videos that do not contain action information, by learning a Latent Action World Model[3]. We will demonstrate how such model can be used to solve planning tasks in robotics and navigation.

[1] Assran, Mido, et al. "V-jepa 2: Self-supervised video models enable understanding, prediction and planning." https://arxiv.org/abs/2506.09985

[2] Garrido, Quentin, et al. "Intuitive physics understanding emerges from self-supervised pretraining on natural videos." https://arxiv.org/abs/2502.11831


[3] Garrido, Quentin, et al. "Learning Latent Action World Models In The Wild." https://arxiv.org/abs/2601.05230

Location
16 Rue de l'Estrapade
75005 Paris, France