

5 Tables at CVPR | The Reasoning Gap in Visual AI
The Reasoning Gap
Current AI can tell you what's in an image.
It cannot tell you what it means.
That gap — between describing the visual world and understanding it — is the most important unsolved problem in AI today.
It's why robotics keeps breaking in the real world. Why medical imaging models hallucinate. Why autonomous vehicles still can't read a construction zone the way a human driver does in two seconds.
Saturday evening at CVPR, GMI Cloud, NVIDIA, Elorian, NEA, and Twelve Labs are gathering 50 researchers, engineers, founders, and investors to go further than the conference floor allows.
The format:
The room breaks into five hosted tables, each anchored by one host, each built around a specific unsolved question in visual reasoning:
Compute — Is the inference stack we've built for language the wrong foundation for visual reasoning?
Infrastructure — When does throwing more transformer compute at the problem stop working?
Research — What would a real benchmark for visual reasoning actually look like?
Capital — How do you tell the difference between a company that's closing the gap and one that's just describing it better?
Application — What's the first product that only becomes possible when AI can truly understand video?
After that, the tables dissolve. The rest of the night is yours.
Who belongs in this room:
CVPR researchers in vision, multimodal AI, robotics, and embodied systems
ML engineers building visual pipelines in production
Founders working on the application layer of visual AI
Investors looking at where the gap gets closed first
The goal is simple: have the conversation that the conference schedule didn't leave room for — with the people who are actually building what comes next.
Dress code: slightly elevated / cocktail
Saturday night. By invite only.
About the hosts
GMI Cloud One of seven NVIDIA Reference Platform Cloud Partners globally. GMI provides the GPU infrastructure powering the next generation of visual and multimodal AI — from large-scale training to production inference.
NVIDIA The computing platform behind modern AI. NVIDIA's GPU architecture and software stack power the models, research, and systems pushing the frontier of visual intelligence forward.
NEA One of the world's leading venture capital firms, backing foundational companies across AI, enterprise, and deep tech. NEA has invested in Twelve Labs, World Labs, ElevenLabs, and many of the defining companies in the current AI wave.
Elorian Elorian is building the foundation of visual reasoning. We believe building systems that natively understand and reason through the visual medium the way humans do is the most important work in AI today — and a critical step toward the future.
Twelve Labs The leading video understanding AI platform. Twelve Labs builds models that search, understand, and extract meaning from video — moving beyond transcription toward genuine semantic comprehension of visual content.
GMI Cloud · NVIDIA · Elorian · NEA · Twelve Labs