Demystifying Video Reasoning by Ruisi Wang
Abstract: Recent advances in video generation have revealed an unexpected phenomenon: diffusion-based video models exhibit non-trivial reasoning capabilities. We challenge the Chain-of-Frames assumption and uncover a fundamentally different mechanism — Chain-of-Steps (CoS), where reasoning emerges along the diffusion denoising steps. We identify several emergent reasoning behaviors: working memory, self-correction, and perception before action.
Speaker: Ruisi Wang — Researcher with a background in computer science from Nanyang Technological University, working on computer vision, video reasoning, and spatial intelligence.
Website: https://journal.video-reason.com/ To join over zoom, please subscribe to get zoom link: https://forms.gle/ebgyvtLRz8ABTfdX6