Cover Image for AI Research Circle [members and +1s]
Cover Image for AI Research Circle [members and +1s]
Avatar for Rally SF
Presented by
Rally SF
San Francisco events worth showing up for.
20 Went

AI Research Circle [members and +1s]

Registration
Past Event
Welcome! To join the event, please register below.
About Event

About the AI Research Circle

The AI Research Circle is a community gathering at The Commons where we explore and discuss AI research papers together. You don’t need to be a researcher—just bring curiosity and an interest in the field.

Each session, we choose a paper, break it down into plain language, and dive into open conversation. The goal is to make cutting-edge ideas accessible, spark thoughtful debate, and connect across disciplines.

Session Details

Paper: Persona Vectors: Monitoring and Controlling Character Traits in Language Models (Chen et al., 2025) 

Large language models often present as a single “assistant” persona—but in practice, their personality can shift in surprising (and sometimes undesirable) ways due to prompting, fine-tuning, or training data. This paper introduces persona vectors: linear directions in a model’s activation space that correspond to specific character traits like evil, sycophancy, and hallucination propensity.

We’ll use this paper as a jumping-off point to talk about what “personality” even means for LLMs, how linear directions emerge in activation space, and what this implies for alignment, safety, and tooling for model developers.

Reading

Primary (please read if you can):

Who should join

Anyone interested in:

  • How “personality” and “traits” show up in LLM behavior

  • Mechanistic-ish tools for interpreting and steering models

  • Practical alignment questions around fine-tuning, safety, and data pipelines

No formal background in interpretability or alignment required—we’ll aim to keep things intuitive and conversational while still engaging for people who read a lot of papers.

Location
540 Laguna St, San Francisco + Hogwarts Hall
Avatar for Rally SF
Presented by
Rally SF
San Francisco events worth showing up for.
20 Went