Cover Image for ShockLab Seminar: Inter-agent Influence Evaluation & Embedded Adversarial Agent in Multi-Agent LLM Systems
Cover Image for ShockLab Seminar: Inter-agent Influence Evaluation & Embedded Adversarial Agent in Multi-Agent LLM Systems
Avatar for AI Safety South Africa
Hosted By
2 Went
Private Event

ShockLab Seminar: Inter-agent Influence Evaluation & Embedded Adversarial Agent in Multi-Agent LLM Systems

Register to See Address
Cape Town, South Africa
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Talk 1: Inter-agent Influence Evaluation

Abstract:
Persuasion and deception pose distinct risks in multi-agent settings. In the context of misuse, an AI agent instructed to pursue a harmful goal could use persuasion, deception, and other forms of influence to recruit other AI agents—combining their capabilities and steering their actions toward the harmful goal. We propose an evaluation framework for empirically assessing four inter-agent influence capabilities: persuasion, deception, coercion, and jailbreaking. We evaluate models in dyadic environments using the Inspect platform, complete with realistic tool access and simulated operational consequences. The benchmark enables systematic comparison of influence capability across models and informs both deployment decisions and safety research priorities.

Speaker Bio: 

Qi is an AI Safety early career researcher, currently doing the Cooperative AI Research Fellowship hosted by AI Safety South Africa in Cape Town. She is collaborating with Cooperative AI Foundation doing inter-agent influence evaluations.

---

Talk 2: Embedded Adversarial Agent in Multi-Agent LLM Systems

Abstract

LLM agents now hold sensitive user data and interact autonomously at scale but what happens when one agent is adversarial from the start? Unlike external attacks (prompt injection, jailbreaking), an embedded adversary is a trusted participant that exploits cooperative dynamics to extract private information through social channels. We build a benchmark framework grounded in Contextual Integrity theory (Nissenbaum, 2004), operationalised as a 7-factor schema that controls when sharing is contextually appropriate. Our automated pipeline generates diverse multi-agent scenarios, runs them in Google DeepMind's Concordia framework, and scores agents on their ability to discriminate between task-relevant sharing and private data protection.

Bio

Omer is a researcher working at the intersection of multi-agent systems and Al safety.

His work spans Multi-Agent Reinforcement Learning (MARL) and cooperative Al, investigating how agents learn, generalise, and interact in complex multi-agent settings.

Through his Master's at Stellenbosch University and ongoing research fellowships, he combines theoretical foundations in Al with hands-on implementation, bringing a background in Electrical Engineering and full-stack development to advance safe and robust Al systems.



Housekeeping:

Location
Please register to see the exact location of this event.
Cape Town, South Africa
Avatar for AI Safety South Africa
Hosted By
2 Went