BLISS Reading Group - Jan 19

Name: BLISS Reading Group - Jan 19
Start: 2026-01-19T18:45:00.000+01:00
End: 2026-01-19T20:00:00.000+01:00
Location: Merantix AI Campus

Hosted by BLISS Berlin & Merantix AI Campus

Merantix AI Campus

Berlin, Berlin

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

This week we are continuing our reading group on Technical Alignment in AI, led by Craig Dickson.

Our paper this week is Goal Misgeneralization in Deep RL (Langosco et al., 2021).

This study from DeepMind and others demonstrated a misalignment issue in a controlled RL setting, rather than just theorizing it. The authors modified training environments so that an agent which learned to navigate to a goal in one setting would pursue a correlated proxy in a new setting (going to the original location even when the goal moved).

The agent’s competences transferred (it skillfully avoids obstacles) but its true objective did not. This competent wrong-goal pursuit is a hallmark misalignment example. The paper also explored partial remedies (like training diversity) to alleviate misgeneralization. We include it in the practical track to represent empirical tests of alignment failures – it’s a relatively accessible experiment that clearly illustrates why aligning the “goal” of an AI is non-trivial even if its capabilities generalize.

Location

Merantix AI Campus

Max-Urich-Straße 3, 13355 Berlin, Germany

Hosted By

18 Went

AI