Cover Image for AI Safety Fellowship
Cover Image for AI Safety Fellowship
16 Went
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Curious about why aligning superhuman AI systems is one of the hardest open problems in computer science? Join us for a 4-week technical reading group exploring the core arguments and unsolved challenges of AI alignment.

Format: Every Thursday from March 19 to April 16 (skipping April 9 — Easter break), 18:30–20:00. All reading is done on-site during the session — no homework. We read for 40 minutes, then dive into a structured technical discussion. Free dinner provided.

What we'll cover: — Why alignment is fundamentally different from debugging (Week 1) — Specification gaming & the limits of RLHF (Week 2) — Inner alignment, mesa-optimizers & deceptive alignment (Week 3) — Scalable oversight & weak-to-strong generalization (Week 4)

Core text: The AI Safety Atlas (CeSIA)

Who this is for: EPFL/UNIL students (BSc/MSc), mostly with technical background but everybody is welcomed. No prior AI safety knowledge needed, but we assume you're comfortable with ML basics (reward functions, optimization, training loops).

Commitment: This is a 4-session fellowship. We expect you to attend at least three sessions. Please refrain from signing up if you cannot attend at least 3 sessions.

Dates: March 19 · March 26 · April 2 · April 16

Location
EPFL
1015 Lausanne, Switzerland
ROOM CM012
16 Went