

Hosts: Junfan Zhu, Aurora Feng
discord.gg/WH7DrTHRXK
Robotics & World Models Reading Club 09: CVPR Warm-up & Founders Spotlight — DeltaWorld + VisuoTactile Dexterous Hands | San Francisco 0523
Robotics & World Models Reading Club 09: CVPR Warm-up & Founders Spotlight — DeltaWorld + VisuoTactile Dexterous Hands | San Francisco 0523
A high-signal reading group for AI researchers & builders pushing the frontiers of robotic world models, WAMs, and embodied intelligence. In our previous sessions, we brought together researchers and engineers from Boston Dynamics, Google, NVIDIA, Stanford, UC Berkeley, CMU, Dyna, ByteDance, Tesla, Generalist, Rhoda AI, and leading Bay Area robotics startups.
Hosted by Junfan Zhu & Aurora Feng.
Supported by Neural Motion, a universal cross-embodiment data representation layer for embodied AI.
Reading Club 09's Core Theme
Keynote 1 by Tommie Kerssies (Amazon Frontier AI & Robotics)
A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens
Anticipating diverse future states is a central challenge in video world modeling. Discriminative world models produce a deterministic prediction that implicitly averages over possible futures, while existing generative world models remain computationally expensive. Recent work demonstrates that predicting the future in the feature space of a vision foundation model (VFM), rather than a latent space optimized for pixel reconstruction, requires significantly fewer world model parameters. However, most such approaches remain discriminative. In this work, we introduce DeltaTok, a tokenizer that encodes the VFM feature difference between consecutive frames into a single continuous "delta" token, and DeltaWorld, a generative world model operating on these tokens to efficiently generate diverse plausible futures. Delta tokens reduce video from a three-dimensional spatio-temporal representation to a one-dimensional temporal sequence, for example yielding a 1,024x token reduction with 512x512 frames. This compact representation enables tractable multi-hypothesis training, where many futures are generated in parallel and only the best is supervised. At inference, this leads to diverse predictions in a single forward pass. Experiments on dense forecasting tasks demonstrate that DeltaWorld forecasts futures that more closely align with real-world outcomes, while having over 35x fewer parameters and using 2,000x fewer FLOPs than existing generative world models.
Keynote 2 by Arjun Subramaniam (Factory Intelligence)
ManiFeel
Pre-Readings
Keynote 1 by Tommie Kerssies (Amazon Frontier AI & Robotics)
A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens
TiTok: https://arxiv.org/abs/2406.07550
DINOv3: https://arxiv.org/pdf/2508.10104
Back to the Features: DINO as a Foundation for Video World Models: https://arxiv.org/abs/2507.19468
Keynote 2 by by Arjun Subramaniam (Factory Intelligence)
ManiFeel
Location
San Francisco (Downtown)
Date & Time
Saturday, May 23, 2026 | 2:00 PM – 5:00 PM
Join Discord Community
https://discord.gg/WH7DrTHRXK
Follow Saturday Robotics on X
https://x.com/saturdayrobotic
Agenda
2:00 PM – 2:30 PM Door Opens & Social
Food 😋, beverages🧋 and UNLIMITED strawberries 🍓 (our official reading club fruits ☺️😄).
2:30 PM – 3:00 PM Keynote 1 by Tommie Kerssies (Amazon Frontier AI & Robotics), DeltaWorld: Generating Diverse Video Futures with Delta Tokens
3:00 PM – 3:30 PM Keynote 2 by
Online access via Zoom: TBD
YouTube Recording: TBD (We are looking for recording volunteers)
3:30 PM – 5:00 PM Q&A, open-floor roundtable (10–20 min per topic) on spotlight papers or any paper you’d like to highlight. Feel free to share why the paper matters and its technical details.
Future events
#cvpr-denver-meetup-0606: Saturday Robotics CVPR & World Models Researchers Meetup—Denver 0606
CVPR Denver Luma: https://luma.com/zamm9g2g
Past events
#reading-club-08-0516: Embodied Human Data as the “Internet of Motion and Behavior”
Session 08 Luma: https://luma.com/qoxioge7
#reading-club-07-0509: Learning to Dream: World Models, Imagination, Path to Foundation Models for Control
Session 07 Luma: https://luma.com/srhe0vuo
#reading-club-06-0502: Evolution of Video World Models for Robotics
Session 06 Luma: https://luma.com/sdrd4zwr
Reading Club 06 Review: https://x.com/junfanzhu98/status/2050834699275383008?s=20
#reading-club-05-0425: World Models for Physical Intelligence: From Predictive Brains to Embodied Robots
Session 05 Luma: https://luma.com/p7zvpyvg
Reading Club 05 Review: https://x.com/junfanzhu98/status/2048315020946317710?s=20
YouTube Recording: https://youtu.be/RVy6oQXNDgc?si=u2VLtCBjfdMvXaf-
#reading-club-04-0418: Abstractions of the Physical World for Decision-Making
Session 04 Luma: https://luma.com/atv7bm3i
Reading Club 04 Review: https://x.com/junfanzhu98/status/2045770010979905862
YouTube Recording: https://www.youtube.com/@saturdayrobotic
#reading-club-03-0411: Robotic Policy Adaptation
Session 03 Luma: https://luma.com/561xgirg
Reading Club 03 Review: https://x.com/junfanzhu98/status/2043243484568768519?s=20
YouTube Recording: https://www.youtube.com/@saturdayrobotic
#reading-club-02-0404: JEPA Zoo
Session 02 Luma: https://luma.com/g3qrrti0
Reading Club 02 Review (liked by Yann LeCun on X): https://x.com/junfanzhu98/status/2040716119259164673?s=20
#reading-club-01-0328
Session 01 Luma: https://luma.com/8s4w1wu6
Reading Club 01 Review (liked by Yann LeCun on X): https://x.com/junfanzhu98/status/2038153945219305812
Logistics
Spots are limited. Please arrive by 2:00 PM for check-in. Keynote will begin promptly at 2:30 PM.
We currently do not have volunteers available to assist with late check-ins. Given the high volume of inquiries and 100+ attendees (both online and onsite), we kindly ask that you arrive on time to ensure smooth entry.
Hosts: Junfan Zhu, Aurora Feng
discord.gg/WH7DrTHRXK