Cover Image for Reading Group: Olmix: A Framework for Data Mixing Throughout LM Development

Presented by

Snorkel AI (snorkel.ai) develops the datasets, benchmarks, and evaluation methods that help AI and agentic systems learn, adapt, and perform in the real world.

Hosted By

203 Went

IA

Featured in

San Francisco

Reading Group: Olmix: A Framework for Data Mixing Throughout LM Development

Name: Reading Group: Olmix: A Framework for Data Mixing Throughout LM Development
Start: 2026-04-29T17:30:00.000-07:00
End: 2026-04-29T20:00:00.000-07:00
Location: 101 Second Street

Snorkel AI Community Events

101 Second Street

San Francisco, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Join us for the launch of the Snorkel AI Reading Group, a recurring forum to explore the latest frontier developments in AI while building meaningful connections within the community.

In our inaugural session, Mayee Chen of Stanford AI Research Lab will dive into her paper “Olmix: A Framework for Data Mixing Throughout LM Development.”

Agenda:

5:30pm - doors open
6pm - talk begins
Light drinks and appetizers provided

Training data is one of the most powerful levers in modern language models. This talk dives into data mixing, a critical but under-explored factor that can significantly impact model performance.

You’ll learn:

What actually works (and doesn’t) when mixing data across domains
Which design choices meaningfully improve model performance
How to handle constantly evolving datasets in real-world LM development
A practical method to reduce compute by 74% while maintaining performance
How smarter data mixing can drive double-digit gains on downstream tasks

Location