Cover Image for Reading Group: Olmix: A Framework for Data Mixing Throughout LM Development
Cover Image for Reading Group: Olmix: A Framework for Data Mixing Throughout LM Development
Avatar for Snorkel AI Community Events

Reading Group: Olmix: A Framework for Data Mixing Throughout LM Development

Registration
Welcome! To join the event, please register below.
About Event

Join us for the launch of the Snorkel AI Reading Group, a recurring forum to explore the latest frontier developments in AI while building meaningful connections within the community.

In our inaugural session, Mayee Chen of Stanford AI Research Lab will dive into her paper Olmix: A Framework for Data Mixing Throughout LM Development.”

Agenda:

5:30pm - doors open
6pm - talk begins
Light drinks and appetizers provided

Training data is one of the most powerful levers in modern language models. This talk dives into data mixing, a critical but under-explored factor that can significantly impact model performance.

You’ll learn:

  • What actually works (and doesn’t) when mixing data across domains

  • Which design choices meaningfully improve model performance

  • How to handle constantly evolving datasets in real-world LM development

  • A practical method to reduce compute by 74% while maintaining performance

  • How smarter data mixing can drive double-digit gains on downstream tasks

Location
101 Second Street
San Francisco, CA 94105, USA
Avatar for Snorkel AI Community Events