Can Models Think Without Language? Video as the Next Substrate of Intelligence by Zhongang Cai
Abstract: Reasoning is often viewed as inseparable from language, driven by the remarkable success of large language models. But is language the only medium through which intelligence can emerge? In this talk, I will present A Very Big Video Reasoning Suite (VBVR), an inaugural effort toward a new paradigm of model reasoning that uses video as the substrate of thought. Unlike language, video naturally captures rich spatial and temporal structure, opening new possibilities for how models perceive, reason about, and make predictions. I will also share recent findings on how reasoning emerges through video generation, and how these developments may reshape the future landscape of large multimodal models.
Speaker: Zhongang Cai — Ph.D. from MMLab, Nanyang Technological University, where his research focuses on spatial intelligence, 3D generation, and video reasoning.
Website: https://journal.video-reason.com/ To join over zoom, please subscribe to get zoom link: https://forms.gle/ebgyvtLRz8ABTfdX6