Cover Image for Frontiers: Architecting the next generation of multimodal benchmarks
Cover Image for Frontiers: Architecting the next generation of multimodal benchmarks
Avatar for Manifold Research
Presented by
Manifold Research
We're a mission driven R&D institute dedicated to advancing fundamental discoveries and carrying them to real-world impact.

Frontiers: Architecting the next generation of multimodal benchmarks

Zoom
Registration
Welcome! To join the event, please register below.
About Event

Welcome to Frontiers - a series where we bring top researchers, engineers, designers, and leaders working at the cutting edge of various fields to go deep on their work with the Manifold Community.

For this talk, our speaker will be Pranav Guruprasad. Pranav is a Founding Research Engineer at Fig, and the Research and Engineering lead of the MultiNet research project at Manifold. At Fig, he works on advancing Multimodal Action Models for AI systems capable of executing complex, long-horizon digital tasks. Through MultiNet, he is developing comprehensive benchmarks to evaluate generalist Multimodal Action systems across diverse modalities, tasks, and domains, pushing the frontier of measurable and achievable progress in Multimodal AI.

Abstract

Frontier AI models exhibit a critical performance paradox: near-ceiling scores on canonical benchmarks (HumanEval, MMLU) coupled with systematic failures on production deployments. Current benchmarks predominantly assess narrow, specialized competencies - competition mathematics, doctoral-level reasoning, software engineering proficiency - while neglecting the foundational capabilities required for robust multimodal action models.

In this talk, we’ll explore these gaps and introduce MultiNet v1.0, a new benchmark that addresses these issues  by systematically profiling state-of-the-art architectures across three model classes (VLMs, VLAs, generalist models) on a unified suite spanning: common-sense reasoning, object detection, visual question answering, tool use, multi-agent coordination in discrete action spaces, and robotic manipulation and locomotion. Early  results reveal stark capability disparities across modalities and task domains, exposing architectural prerequisites, training methodologies, and essential data distributions for true multimodal generalization. These findings, and the Multinet Benchmark more broadly, establish a blueprint for the next-generation of benchmarks designed for the functional intelligence era - where models must demonstrate not just knowledge, but reliable, generalizable action.

We’re growing our core team and pursuing new projects. If you’re interested in working together, see our website for active initiatives and open positions, join the conversation on Discord and check out our Github.

If you want to see more of our updates as we work to explore and advance the field of Intelligent Systems, follow us on Twitter and Linkedin!

Avatar for Manifold Research
Presented by
Manifold Research
We're a mission driven R&D institute dedicated to advancing fundamental discoveries and carrying them to real-world impact.