Cover Image for SRI Seminar Series: Zhijing Jin, “Emergent AI safety risks in multi-agent LLMs”
Cover Image for SRI Seminar Series: Zhijing Jin, “Emergent AI safety risks in multi-agent LLMs”
Avatar for Jinesis AI Lab
Presented by
Jinesis AI Lab
The Jinesis AI Lab conducts frontier research on AI, Large Language Models, and Causality.
1 Going

SRI Seminar Series: Zhijing Jin, “Emergent AI safety risks in multi-agent LLMs”

Registration
Welcome! To join the event, please register below.
About Event

As AI systems take on more autonomous roles in the knowledge-work economy, they’ll increasingly interact with each other. However, will the AI agents coordinate for social good, or exploit rival agents and people in ways that put humans at serious risk?

In this talk I will explain how we assess these dangers with large-scale social simulations and game-theoretic analysis. We find that reasoning agents with sophisticated thinking often fail to sustain cooperation in a multitude of settings. Surprisingly, stronger reasoning capabilities often make models more prone to selfish strategies like free-riding. Finally, we present a framework that organizes multi-agent safety threats using well-established game-theoretic models, spanning multiple canonical dynamics grounded in diverse, realistic instantiations to probe robustness beyond any single setting. These strategic failures (where models’ decisions diverge from game-theoretic optimality) persist for state-of-the-art reasoning models, but various intervention mechanisms such as mediation by a neutral agent and agent-to-agent commitment protocols show a promising path towards pareto frontier in mutli-agent scenarios.

Sign up here: https://www.eventbrite.ca/e/sri-seminar-series-zhijing-jin-tickets-1977924987883?aff=oddtdtcreator

Location
Schwartz Reisman Institute for Technology and Society
108 College St Unit W1060, Toronto, ON M5G 0C6, Canada
Avatar for Jinesis AI Lab
Presented by
Jinesis AI Lab
The Jinesis AI Lab conducts frontier research on AI, Large Language Models, and Causality.
1 Going