Cover Image for Can Models Think Without Language? Video as the Next Substrate of Intelligence by Zhongang Cai
Cover Image for Can Models Think Without Language? Video as the Next Substrate of Intelligence by Zhongang Cai
Avatar for Video Model Journal Club
Every week we pick one paper and go deep — video generation, world models, physical reasoning, diffusion, flow matching, and everything in between.
Hosted By
48 Going

Can Models Think Without Language? Video as the Next Substrate of Intelligence by Zhongang Cai

Zoom
Registration
Welcome! To join the event, please register below.
About Event

Abstract: Reasoning is often viewed as inseparable from language, driven by the remarkable success of large language models. But is language the only medium through which intelligence can emerge? In this talk, I will present A Very Big Video Reasoning Suite (VBVR), an inaugural effort toward a new paradigm of model reasoning that uses video as the substrate of thought. Unlike language, video naturally captures rich spatial and temporal structure, opening new possibilities for how models perceive, reason about, and make predictions. I will also share recent findings on how reasoning emerges through video generation, and how these developments may reshape the future landscape of large multimodal models.

Speaker: Zhongang Cai — Ph.D. from MMLab, Nanyang Technological University, where his research focuses on spatial intelligence, 3D generation, and video reasoning.

Website: https://journal.video-reason.com/ To join over zoom, please subscribe to get zoom link: https://forms.gle/ebgyvtLRz8ABTfdX6

Avatar for Video Model Journal Club
Every week we pick one paper and go deep — video generation, world models, physical reasoning, diffusion, flow matching, and everything in between.
Hosted By
48 Going