AI Research Circle [members and +1s]
About the AI Research Circle
The AI Research Circle is a community gathering at The Commons where we explore and discuss AI research papers together. You don’t need to be a researcher—just bring curiosity and an interest in the field.
Each session, we choose a paper, break it down into plain language, and dive into open conversation. The goal is to make cutting-edge ideas accessible, spark thoughtful debate, and connect across disciplines.
Session Details
Paper: Persona Vectors: Monitoring and Controlling Character Traits in Language Models (Chen et al., 2025)
Large language models often present as a single “assistant” persona—but in practice, their personality can shift in surprising (and sometimes undesirable) ways due to prompting, fine-tuning, or training data. This paper introduces persona vectors: linear directions in a model’s activation space that correspond to specific character traits like evil, sycophancy, and hallucination propensity.
We’ll use this paper as a jumping-off point to talk about what “personality” even means for LLMs, how linear directions emerge in activation space, and what this implies for alignment, safety, and tooling for model developers.
Reading
Primary (please read if you can):
Persona Vectors: Monitoring and Controlling Character Traits in Language Models (Runjin Chen et al., 2025).
Who should join
Anyone interested in:
How “personality” and “traits” show up in LLM behavior
Mechanistic-ish tools for interpreting and steering models
Practical alignment questions around fine-tuning, safety, and data pipelines
No formal background in interpretability or alignment required—we’ll aim to keep things intuitive and conversational while still engaging for people who read a lot of papers.