Cover Image for AI Research Circle [members and +1s]

Presented by

The Commons Member Calendar • If you would like to join The Commons, apply to be a member at https://www.thesfcommons.com/

Hosted By

16 Going

AI

AI Research Circle [members and +1s]

Name: AI Research Circle [members and +1s]
Start: 2025-12-08T19:30:00.000-08:00
End: 2025-12-08T20:45:00.000-08:00
Location: 550 Laguna St, San Francisco + Full Studio

The Commons

550 Laguna St, San Francisco + Full Studio

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

About the AI Research Circle

The AI Research Circle is a community gathering at The Commons where we explore and discuss AI research papers together. You don’t need to be a researcher—just bring curiosity and an interest in the field.

Each session, we choose a paper, break it down into plain language, and dive into open conversation. The goal is to make cutting-edge ideas accessible, spark thoughtful debate, and connect across disciplines.

Session Details

Paper: Persona Vectors: Monitoring and Controlling Character Traits in Language Models (Chen et al., 2025)

Large language models often present as a single “assistant” persona—but in practice, their personality can shift in surprising (and sometimes undesirable) ways due to prompting, fine-tuning, or training data. This paper introduces persona vectors: linear directions in a model’s activation space that correspond to specific character traits like evil, sycophancy, and hallucination propensity.

We’ll use this paper as a jumping-off point to talk about what “personality” even means for LLMs, how linear directions emerge in activation space, and what this implies for alignment, safety, and tooling for model developers.

Reading

Primary (please read if you can):

Persona Vectors: Monitoring and Controlling Character Traits in Language Models (Runjin Chen et al., 2025).

Who should join

Anyone interested in:

How “personality” and “traits” show up in LLM behavior
Mechanistic-ish tools for interpreting and steering models
Practical alignment questions around fine-tuning, safety, and data pipelines

No formal background in interpretability or alignment required—we’ll aim to keep things intuitive and conversational while still engaging for people who read a lot of papers.

Location