Cover Image for AI Research Circle: AI Alignment, Unpacked
Cover Image for AI Research Circle: AI Alignment, Unpacked
Avatar for Rally SF
Presented by
Rally SF
San Francisco events worth showing up for.
50 Going

AI Research Circle: AI Alignment, Unpacked

Registration
Welcome! To join the event, please register below.
About Event

When Anthropic built Claude, they didn't just write a list of rules. They wrote a constitution.

The detailed 84 page document covers everything from safety priorities to how Claude should handle moral disagreements. But you don't need to read it cover to cover. The structure is clear, the writing is plain language, and even skimming the table of contents raises real questions about what we're encoding into AI systems.

This session, we'll use Claude's constitution as our anchor text and explore the broader landscape of AI alignment:

The constitution itself. What's actually in it? What assumptions does it encode? What's missing?

Rules vs. dispositions. Is the current framing too rule-based? What would it look like to free an agent of bad dispositions rather than constrain it into good behavior?

Constitutional AI as a technique. How does it work under the hood, and how does it compare to other alignment approaches like RLHF?

Co-facilitated by Anup Gosavi and Emily Hough-Kovacs.

Pre-read: Anthropic's Claude Constitution: https://www.anthropic.com/constitution

Optional supplemental reading:

  • Bai et al., "Constitutional AI: Harmlessness from AI Feedback" (2022): https://arxiv.org/abs/2212.08073

  • Dung & Mai, "AI Alignment Strategies from a Risk Perspective" (2024)

  • Esther An, "Towards Principled AI Alignment: An Evaluation and Augmentation of Inverse Constitutional AI" (2025)


About the AI Research Circle A community gathering at The Commons where we explore AI research together. No research background required. Just curiosity. Each session, we pick a paper or topic, break it down, and open it up for discussion. The goal: make cutting-edge ideas accessible and spark conversation across disciplines.

Location
550 Laguna St, San Francisco + Full Studio
Avatar for Rally SF
Presented by
Rally SF
San Francisco events worth showing up for.
50 Going