ShockLab Seminar: Inter-agent Influence Evaluation & Embedded Adversarial Agent in Multi-Agent LLM Systems

AI Safety South Africa

Cape Town, South Africa

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Talk 1: Inter-agent Influence Evaluation

Abstract:
Persuasion and deception pose distinct risks in multi-agent settings. In the context of misuse, an AI agent instructed to pursue a harmful goal could use persuasion, deception, and other forms of influence to recruit other AI agents—combining their capabilities and steering their actions toward the harmful goal. We propose an evaluation framework for empirically assessing four inter-agent influence capabilities: persuasion, deception, coercion, and jailbreaking. We evaluate models in dyadic environments using the Inspect platform, complete with realistic tool access and simulated operational consequences. The benchmark enables systematic comparison of influence capability across models and informs both deployment decisions and safety research priorities.

Speaker Bio:

Qi is an AI Safety early career researcher, currently doing the Cooperative AI Research Fellowship hosted by AI Safety South Africa in Cape Town. She is collaborating with Cooperative AI Foundation doing inter-agent influence evaluations.

---

Talk 2: Embedded Adversarial Agent in Multi-Agent LLM Systems

Abstract

LLM agents now hold sensitive user data and interact autonomously at scale but what happens when one agent is adversarial from the start? Unlike external attacks (prompt injection, jailbreaking), an embedded adversary is a trusted participant that exploits cooperative dynamics to extract private information through social channels. We build a benchmark framework grounded in Contextual Integrity theory (Nissenbaum, 2004), operationalised as a 7-factor schema that controls when sharing is contextually appropriate. Our automated pipeline generates diverse multi-agent scenarios, runs them in Google DeepMind's Concordia framework, and scores agents on their ability to discriminate between task-relevant sharing and private data protection.

Bio

Omer is a researcher working at the intersection of multi-agent systems and Al safety.

His work spans Multi-Agent Reinforcement Learning (MARL) and cooperative Al, investigating how agents learn, generalise, and interact in complex multi-agent settings.

Through his Master's at Stellenbosch University and ongoing research fellowships, he combines theoretical foundations in Al with hands-on implementation, bringing a background in Electrical Engineering and full-stack development to advance safe and robust Al systems.

Housekeeping:

Join the Shocklab events public Google calendar to see upcoming events
You can find selected past events at shocklab.net/seminars.
Sign up to speak here

Location

Please register to see the exact location of this event.

Cape Town, South Africa

Presented by

AI Safety South Africa

Hosted By

2 Went

ShockLab Seminar: Inter-agent Influence Evaluation & Embedded Adversarial Agent in Multi-Agent LLM Systems

​Talk 1: Inter-agent Influence Evaluation

​Talk 2: Embedded Adversarial Agent in Multi-Agent LLM Systems

Talk 1: Inter-agent Influence Evaluation

Talk 2: Embedded Adversarial Agent in Multi-Agent LLM Systems