Strange Evals - VLMs

Name: Strange Evals - VLMs
Start: 2026-05-12T18:30:00.000-07:00
End: 2026-05-12T20:00:00.000-07:00
Location: San Francisco, CA

Hosted by Parth A. Patel & 4 others

Register to See Address

San Francisco, CA

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

a paper reading club but we deep dive into benchmarks

----

Each session we dive deep on a related battery of widely cited benchmarks so we can develop an intuition about what we're up against. This includes reading papers, but also eyeballing raw benchmark samples (a surprising number of them don't make any sense).

This week we are discussing the big VLM benchmarks to find out why almost all of them are already fully saturated.

---

Pre-reading:

Read the ZeroBench paper: https://zerobench.github.io

OR

The MMMU Pro paper: https://arxiv.org/abs/2409.02813

OR

Spend 30 minutes eyeballing the samples from some of these benchmarks https://github.com/Lewington-pitsos/strange-vlm and come up with something interesting.

---

Special thanks to HUD for hosting us: https://www.hud.ai

Attendance will be limited to keep discussion focused.

Food web image source:
MLA (9th edition): Kembhavi, Aniruddha, et al. "A Diagram Is Worth A Dozen Images." Computer Vision – ECCV 2016, Springer, 2016, pp. 235–251.

Location

Please register to see the exact location of this event.

San Francisco, CA

Hosted By

24 Went

AI