Cover Image for Robotics & World Models Reading Club 09: CVPR Warm-up & Founders Spotlight — DeltaWorld + VisuoTactile Dexterous Hands | San Francisco 0523
Cover Image for Robotics & World Models Reading Club 09: CVPR Warm-up & Founders Spotlight — DeltaWorld + VisuoTactile Dexterous Hands | San Francisco 0523
Avatar for Saturday Robotics
Presented by
Saturday Robotics
🤖 Saturday Reading Club on Robotics & World Models for AI Researchers in SF
Hosts: Junfan Zhu, Aurora Feng
discord.gg/WH7DrTHRXK
87 Went

Robotics & World Models Reading Club 09: CVPR Warm-up & Founders Spotlight — DeltaWorld + VisuoTactile Dexterous Hands | San Francisco 0523

Register to See Address
San Francisco, CA
Registration
Past Event
Please click on the button below to join the waitlist. You will be notified if additional spots become available.
About Event

Robotics & World Models Reading Club 09: CVPR Warm-up & Founders Spotlight — DeltaWorld + VisuoTactile Dexterous Hands | San Francisco 0523

A high-signal reading group for AI researchers & builders pushing the frontiers of robotic world models, WAMs, and embodied intelligence. In our previous sessions, we brought together researchers and engineers from Boston Dynamics, Google, NVIDIA, Stanford, UC Berkeley, CMU, Dyna, ByteDance, Tesla, Generalist, Rhoda AI, and leading Bay Area robotics startups.

Hosted by Junfan Zhu & Aurora Feng.

Supported by Neural Motion, a universal cross-embodiment data representation layer for embodied AI.

​Reading Club 09's Core Theme

Keynote 1 by Tommie Kerssies (Amazon Frontier AI & Robotics)

A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens

Anticipating diverse future states is a central challenge in video world modeling. Discriminative world models produce a deterministic prediction that implicitly averages over possible futures, while existing generative world models remain computationally expensive. Recent work demonstrates that predicting the future in the feature space of a vision foundation model (VFM), rather than a latent space optimized for pixel reconstruction, requires significantly fewer world model parameters. However, most such approaches remain discriminative. In this work, we introduce DeltaTok, a tokenizer that encodes the VFM feature difference between consecutive frames into a single continuous "delta" token, and DeltaWorld, a generative world model operating on these tokens to efficiently generate diverse plausible futures. Delta tokens reduce video from a three-dimensional spatio-temporal representation to a one-dimensional temporal sequence, for example yielding a 1,024x token reduction with 512x512 frames. This compact representation enables tractable multi-hypothesis training, where many futures are generated in parallel and only the best is supervised. At inference, this leads to diverse predictions in a single forward pass. Experiments on dense forecasting tasks demonstrate that DeltaWorld forecasts futures that more closely align with real-world outcomes, while having over 35x fewer parameters and using 2,000x fewer FLOPs than existing generative world models.

Keynote 2 by Arjun Subramaniam (Factory Intelligence)

ManiFeel


​​​Pre-Readings

Keynote 1 by Tommie Kerssies (Amazon Frontier AI & Robotics)

A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens

TiTok: https://arxiv.org/abs/2406.07550
DINOv3: https://arxiv.org/pdf/2508.10104
Back to the Features: DINO as a Foundation for Video World Models: https://arxiv.org/abs/2507.19468

Keynote 2 by by Arjun Subramaniam (Factory Intelligence)

ManiFeel


​Location

San Francisco (Downtown)

​​​​Date & Time

Saturday, May 23, 2026 | 2:00 PM – 5:00 PM

​​​​​Join Discord Community

https://discord.gg/WH7DrTHRXK

​​​​Follow Saturday Robotics on X

https://x.com/saturdayrobotic


​​​​Agenda

2:00 PM – 2:30 PM Door Opens & Social

  • Food 😋, beverages🧋 and UNLIMITED strawberries 🍓 (our official reading club fruits ☺️😄).

2:30 PM – 3:00 PM Keynote 1 by Tommie Kerssies (Amazon Frontier AI & Robotics), DeltaWorld: Generating Diverse Video Futures with Delta Tokens

3:00 PM – 3:30 PM Keynote 2 by

Online access via Zoom: TBD

YouTube Recording: TBD (We are looking for recording volunteers)

3:30 PM – 5:00 PM Q&A, ​open-floor roundtable (10–20 min per topic) on spotlight papers or any paper you’d like to highlight. Feel free to share why the paper matters and its technical details.


​​​Future events

#cvpr-denver-meetup-0606: Saturday Robotics CVPR & World Models Researchers Meetup—Denver 0606

​Past events

#reading-club-08-0516: Embodied Human Data as the “Internet of Motion and Behavior”

#reading-club-07-0509: Learning to Dream: World Models, Imagination, Path to Foundation Models for Control

#reading-club-06-0502: Evolution of Video World Models for Robotics

#reading-club-05-0425: World Models for Physical Intelligence: From Predictive Brains to Embodied Robots

#reading-club-04-0418: Abstractions of the Physical World for Decision-Making

#reading-club-03-0411: Robotic Policy Adaptation

#reading-club-02-0404: JEPA Zoo

#reading-club-01-0328

​​​​Logistics

Spots are limited. Please arrive by 2:00 PM for check-in. Keynote will begin promptly at 2:30 PM.

  • We currently do not have volunteers available to assist with late check-ins. Given the high volume of inquiries and 100+ attendees (both online and onsite), we kindly ask that you arrive on time to ensure smooth entry.

Location
Please register to see the exact location of this event.
San Francisco, CA
Avatar for Saturday Robotics
Presented by
Saturday Robotics
🤖 Saturday Reading Club on Robotics & World Models for AI Researchers in SF
Hosts: Junfan Zhu, Aurora Feng
discord.gg/WH7DrTHRXK
87 Went