Cover Image for BLISS Reading Group - Jan 19
Cover Image for BLISS Reading Group - Jan 19
Hosted By
1 Going

BLISS Reading Group - Jan 19

Hosted by BLISS Berlin
Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

​This week we are continuing our reading group on Technical Alignment in AI, led by Craig Dickson.

Our paper this week is Goal Misgeneralization in Deep RL (Langosco et al., 2021).

This study from DeepMind and others demonstrated a misalignment issue in a controlled RL setting, rather than just theorizing it. The authors modified training environments so that an agent which learned to navigate to a goal in one setting would pursue a correlated proxy in a new setting (going to the original location even when the goal moved).

The agent’s competences transferred (it skillfully avoids obstacles) but its true objective did not. This competent wrong-goal pursuit is a hallmark misalignment example. The paper also explored partial remedies (like training diversity) to alleviate misgeneralization. We include it in the practical track to represent empirical tests of alignment failures – it’s a relatively accessible experiment that clearly illustrates why aligning the “goal” of an AI is non-trivial even if its capabilities generalize.

Location
Merantix AI Campus
Max-Urich-Straße 3, 13355 Berlin, Germany
Hosted By
1 Going