Cover Image for Breaking Agent Backbones - Julia Bazińska (AISPL Talks #14)
Cover Image for Breaking Agent Backbones - Julia Bazińska (AISPL Talks #14)
Avatar for AI Safety Poland
Presented by
AI Safety Poland
AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.
28 Going

Breaking Agent Backbones - Julia Bazińska (AISPL Talks #14)

Google Meet
Registration
Welcome! To join the event, please register below.
About Event

Welcome to AI Safety Poland Talks!

​A biweekly series where researchers, professionals, and enthusiasts from Poland or connected to the Polish AI community share their work on AI Safety.

💁 Topic: Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
📣 Speaker: Julia Bazińska
🇬🇧 Language: English
🗓️ Date: 14.05.2026, 18:00
📍 Location: Online

Speaker Bio
Julia is a Senior Research Engineer at Lakera AI. Her professional interests are in NLP, AI security, and performance optimization of ML systems. Lakera empowers developers to build secure AI applications and agents with Lakera Guard, which protects against prompt injections, data leaks, and other risks. In late 2025 Lakera was acquired by Check Point.

Abstract
AI agents powered by large language models are being deployed at scale with more and more access rights. The non-deterministic sequential nature of AI agents complicates security modeling, while the integration of traditional software with AI components entangles novel LLM vulnerabilities with conventional security risks. To address these challenges, we introduce the concept of threat snapshots and build a benchmark upon it to evaluate 30+ different LLMs as agent backbones. The b3 benchmark is based on almost 200,000 attacks crowd-sourced from Gandalf: Agent Breaker and offers insights into general trends in LLM security. The work was accepted at ICLR 2026.

Avatar for AI Safety Poland
Presented by
AI Safety Poland
AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.
28 Going