Cover Image for How do we solve alignment?

Presented by

Hub for all Meridian events: AI safety research talks, biosecurity programming, policy discussions, fellowship activities, and community gatherings. Based in Cambridge, focused on frontier risk.

Hosted By

13 Went

AI

How do we solve alignment?

Name: How do we solve alignment?
Start: 2026-03-02T17:30:00.000+00:00
End: 2026-03-02T18:30:00.000+00:00
Location: Meridian Cambridge

Meridian

Meridian Cambridge

Cambridge, United Kingdom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

How Do We Solve Alignment?

A technical AI safety speaker series hosted by CAISH at Meridian, Cambridge.

This week: Inoculation Prompting -- Henry Colbert

Henry is an ERA fellow researching inoculation prompting as a defence against emergent misalignment. The technique works by prepending prompts during finetuning that elicit undesirable behaviours, which reduces a model's propensity to display those behaviours at test time. It's one of the few alignment techniques that appears to work against emergent misalignment, but open questions remain around its brittleness, scalability, and whether it partly just pushes misaligned behaviour behind a backdoor.

Henry will present his current work on combining inoculation prompting with filtered pretraining, and the open questions he's prioritising.

Pre-reading: Henry's recent LessWrong post covers the landscape well. We'd encourage attendees to read it beforehand so we can jump straight into discussion (https://www.lesswrong.com/posts/Km28joWnihcGEKirG/inoculation-prompting-open-questions-and-my-research).

Attendance requires approval. When you register you'll be asked about your background in AI safety. We will be providing food!

Location