Cover Image for AI x Cyber Reading Group
Cover Image for AI x Cyber Reading Group
Avatar for BlueDot Impact
Presented by
BlueDot Impact
We’re building the workforce needed to safely navigate AGI. Contact: [email protected]

AI x Cyber Reading Group

Zoom
Registration
Welcome! To join the event, please register below.
About Event

Automated Jailbreaking and the Cat-and-Mouse Game of AI Security

Modern AI systems rely on safety filters to block harmful outputs. But how robust are those filters when faced with a determined attacker?

In this session, we’ll explore a very recent paper from UK AISI introducing Boundary Point Jailbreaking (BPJ), a method for automatically discovering prompts that bypass safety guardrails, even when the attacker only has black-box access (i.e., they can’t see anything about the model’s internals, just whether a response is blocked or allowed).

We’ll cover:

  • What “black-box” AI access means

  • How the proposed automated jailbreak discovery works at a high level

  • Why boundary-probing strategies are effective

  • What this might imply for AI safety, governance, and adversarial dynamics

No especially deep technical AI background nor cybersecurity experience required. We’ll focus on building our intuition, understanding real-world parallels to traditional cybersecurity, and then have a discussion on the broader implications.

Expect a short (~15 min) walkthrough followed by open conversation, though we might split into breakout rooms depending on the number of attendees.

Link to paper: https://arxiv.org/abs/2602.15001

Avatar for BlueDot Impact
Presented by
BlueDot Impact
We’re building the workforce needed to safely navigate AGI. Contact: [email protected]