Cover Image for Can Models Think Without Language? Video as the Next Substrate of Intelligence by Zhongang Cai

Presented by

Every week we pick one paper and go deep — video generation, world models, physical reasoning, diffusion, flow matching, and everything in between.

Hosted By

48 Going

Can Models Think Without Language? Video as the Next Substrate of Intelligence by Zhongang Cai

Video Model Journal Club

Zoom

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Abstract: Reasoning is often viewed as inseparable from language, driven by the remarkable success of large language models. But is language the only medium through which intelligence can emerge? In this talk, I will present A Very Big Video Reasoning Suite (VBVR), an inaugural effort toward a new paradigm of model reasoning that uses video as the substrate of thought. Unlike language, video naturally captures rich spatial and temporal structure, opening new possibilities for how models perceive, reason about, and make predictions. I will also share recent findings on how reasoning emerges through video generation, and how these developments may reshape the future landscape of large multimodal models.

Speaker: Zhongang Cai — Ph.D. from MMLab, Nanyang Technological University, where his research focuses on spatial intelligence, 3D generation, and video reasoning.

Website: https://journal.video-reason.com/ To join over zoom, please subscribe to get zoom link: https://forms.gle/ebgyvtLRz8ABTfdX6

Presented by

Video Model Journal Club

Every week we pick one paper and go deep — video generation, world models, physical reasoning, diffusion, flow matching, and everything in between.

Hosted By

48 Going