

AI Research Circle: AI Alignment, Unpacked
When Anthropic built Claude, they didn't just write a list of rules. They wrote a constitution.
The detailed 84 page document covers everything from safety priorities to how Claude should handle moral disagreements. But you don't need to read it cover to cover. The structure is clear, the writing is plain language, and even skimming the table of contents raises real questions about what we're encoding into AI systems.
This session, we'll use Claude's constitution as our anchor text and explore the broader landscape of AI alignment:
The constitution itself. What's actually in it? What assumptions does it encode? What's missing?
Rules vs. dispositions. Is the current framing too rule-based? What would it look like to free an agent of bad dispositions rather than constrain it into good behavior?
Constitutional AI as a technique. How does it work under the hood, and how does it compare to other alignment approaches like RLHF?
Co-facilitated by Anup Gosavi and Emily Hough-Kovacs.
Pre-read: Anthropic's Claude Constitution: https://www.anthropic.com/constitution
Optional supplemental reading:
Bai et al., "Constitutional AI: Harmlessness from AI Feedback" (2022): https://arxiv.org/abs/2212.08073
Dung & Mai, "AI Alignment Strategies from a Risk Perspective" (2024)
Esther An, "Towards Principled AI Alignment: An Evaluation and Augmentation of Inverse Constitutional AI" (2025)
About the AI Research Circle A community gathering at The Commons where we explore AI research together. No research background required. Just curiosity. Each session, we pick a paper or topic, break it down, and open it up for discussion. The goal: make cutting-edge ideas accessible and spark conversation across disciplines.