Cover Image for How Training Data Shapes AI Values - Alignment Pretraining

Presented by

BlueDot Impact

We’re building the workforce needed to safely navigate AGI. Contact: [email protected]

Hosted By

AI

How Training Data Shapes AI Values - Alignment Pretraining

BlueDot Impact

Zoom

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

What if the stories we tell about AI are shaping how AI actually behaves?

In this talk, Kyle O'Brien will present findings from their new paper on alignment pretraining.

LLMs learn alignment (or misalignment) from how AIs are portrayed in their training data. When models are trained on text depicting misaligned AI - from science fiction dystopias to technical AI safety papers - they become less aligned. We may be inadvertently making alignment harder by not curating what models learn about themselves.

But there's good news. We can flip this dynamic. By introducing synthetic data featuring aligned, beneficial AI behavior, we significantly improve model alignment. When most of the discourse a model sees about AI depicts good behavior, the model follows suit.

This work represents the first practical demonstration of alignment pretraining - and opens up a promising new subfield for safety research.

You'll learn:

Why current training corpora may be undermining alignment efforts
How synthetic "good examples" of AI behavior improve outcomes
The research agenda for alignment pretraining going forward

Links:

Want to go deeper? -> Apply for a BlueDot course and take your first step today!

Presented by

BlueDot Impact

We’re building the workforce needed to safely navigate AGI. Contact: [email protected]

Hosted By

AI