BLISS Reading Group - Dec 15

Name: BLISS Reading Group - Dec 15
Start: 2025-12-15T18:45:00.000+01:00
End: 2025-12-15T20:00:00.000+01:00
Location: Merantix AI Campus

Hosted by BLISS Berlin

Merantix AI Campus

Berlin, Berlin

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

This week we are continuing our reading group on Technical Alignment in AI, led by Craig Dickson.

Our paper this week is Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022).

An Anthropic study proposing to replace some human oversight with an AI-mediated process. Instead of relying on human labelers for every instance of harmful content, they give the model a “constitution” of principles (a set of rules) and have the AI generate its own critiques and revisions to its answers.

Through this two-phase process (self-critiquing supervised fine-tuning, then reinforcement learning with an AI judge), they train a chatbot to be harmless but non-evasive – it refuses unsafe requests by explaining its objections, without simply dodging . This work is important as a practical alignment strategy that leverages AI feedback (RLAIF) rather than extensive human data. It demonstrated that an AI can improve itself under guided principles to reduce harmful outputs, pointing toward more scalable oversight methods.

Location

Merantix AI Campus

Max-Urich-Straße 3, 13355 Berlin, Germany

Hosted By

7 Went

AI