Cover Image for Models in Moral Mazes: Anthropic scholars present research on misalignment in AI organizations
Cover Image for Models in Moral Mazes: Anthropic scholars present research on misalignment in AI organizations
Avatar for Mox
Presented by
Mox
Hosted By
106 Going
Private Event

Models in Moral Mazes: Anthropic scholars present research on misalignment in AI organizations

Get Tickets
Suggested Donation
$10.00
Pay what you want
Welcome! To join the event, please get your ticket below.
About Event

Judy Shen and Daniel Zhu of Anthropic are visiting Mox to present a preview of their upcoming paper, "Agents, Inc. Misalignment in AI Organizations of Aligned Agents."

Schedule:

6:30PM - Doors

7:00PM - Presentation

7:30PM - Q&A

Paper Abstract: Alignment techniques have thus far focused on single models. However, as large language models are increasingly deployed in orchestrations of multiple agents, we must also study misalignment in multi-agent settings, or AI organizations. Our work examines two such settings: an AI consultancy providing practical solutions to business problems and an AI team writing software. We define two models of misalignment and find that AI organizations can be more effective than single agents at achieving productivity goals—but are also more willing to make ethical tradeoffs. Our results suggest that alignment of multiple AI agents is a challenging problem that does not simply follow from individual model alignment.


Authors (alphabetical): Erik Jones, Judy Hanwen Shen, Jascha Sohl-Dickstein, and Daniel Zhu 

Location
Mox
1680 Mission St, San Francisco, CA 94103, USA
Avatar for Mox
Presented by
Mox
Hosted By
106 Going