Cover Image for Frontiers: Architecting the next generation of multimodal benchmarks

Presented by

We're a mission driven R&D institute dedicated to advancing fundamental discoveries and carrying them to real-world impact.

Frontiers: Architecting the next generation of multimodal benchmarks

Manifold Research

Zoom

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Welcome to Frontiers - a series where we bring top researchers, engineers, designers, and leaders working at the cutting edge of various fields to go deep on their work with the Manifold Community.

For this talk, our speaker will be Pranav Guruprasad. Pranav is a Founding Research Engineer at Fig, and the Research and Engineering lead of the MultiNet research project at Manifold. At Fig, he works on advancing Multimodal Action Models for AI systems capable of executing complex, long-horizon digital tasks. Through MultiNet, he is developing comprehensive benchmarks to evaluate generalist Multimodal Action systems across diverse modalities, tasks, and domains, pushing the frontier of measurable and achievable progress in Multimodal AI.

Abstract

Frontier AI models exhibit a critical performance paradox: near-ceiling scores on canonical benchmarks (HumanEval, MMLU) coupled with systematic failures on production deployments. Current benchmarks predominantly assess narrow, specialized competencies - competition mathematics, doctoral-level reasoning, software engineering proficiency - while neglecting the foundational capabilities required for robust multimodal action models.

In this talk, we’ll explore these gaps and introduce MultiNet v1.0, a new benchmark that addresses these issues by systematically profiling state-of-the-art architectures across three model classes (VLMs, VLAs, generalist models) on a unified suite spanning: common-sense reasoning, object detection, visual question answering, tool use, multi-agent coordination in discrete action spaces, and robotic manipulation and locomotion. Early results reveal stark capability disparities across modalities and task domains, exposing architectural prerequisites, training methodologies, and essential data distributions for true multimodal generalization. These findings, and the Multinet Benchmark more broadly, establish a blueprint for the next-generation of benchmarks designed for the functional intelligence era - where models must demonstrate not just knowledge, but reliable, generalizable action.

We’re growing our core team and pursuing new projects. If you’re interested in working together, see our website for active initiatives and open positions, join the conversation on Discord and check out our Github.

If you want to see more of our updates as we work to explore and advance the field of Intelligent Systems, follow us on Twitter and Linkedin!

Presented by

Manifold Research

We're a mission driven R&D institute dedicated to advancing fundamental discoveries and carrying them to real-world impact.

Frontiers: Architecting the next generation of multimodal benchmarks

​Abstract

Abstract