Cover Image for Multimodal Weekly 73: Video-Language Models, Reasoning-Across-Time in Videos, Long-Horizon Multimodal Inference, and Scaling Vision Encoders

Presented by

This webinar series happens every Friday from 1:30 - 2:30 PM PST. Each webinar will have speakers who will share their startups, projects, or research work in the Multimodal AI space.

Hosted By

5 Went

AI

Multimodal Weekly 73: Video-Language Models, Reasoning-Across-Time in Videos, Long-Horizon Multimodal Inference, and Scaling Vision Encoders

Multimodal Weekly

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

In the 73rd session of Multimodal Weekly, we have four exciting presentations on video-language models, reasoning-across-time in videos, multimodality in long horizon causal inference, and scaling vision encoders for multimodal models.

✅ Peter Yu will present Espresso, a novel method that extracts and compresses spatial and temporal information in videos separately.

✅ Jr-Jen Chen will present ReXTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events.

✅ Zhuoyi Huang will present MARPLE, a benchmark for evaluating long-horizon inference capabilities using multi-modal evidence.

✅ Jieneng Chen will present the study on the analysis of redundancy concerning visual tokens and efficient training within large multimodal models.

Join the Multimodal Minds community to connect with the speakers!

Multimodal Weekly is organized by Twelve Labs, a startup building multimodal foundation models for video understanding. Learn more about Twelve Labs here: https://twelvelabs.io/

Presented by

Multimodal Weekly

This webinar series happens every Friday from 1:30 - 2:30 PM PST. Each webinar will have speakers who will share their startups, projects, or research work in the Multimodal AI space.

Hosted By

5 Went

AI