AI x Cyber Reading Group

BlueDot Impact

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Automated Jailbreaking and the Cat-and-Mouse Game of AI Security

Modern AI systems rely on safety filters to block harmful outputs. But how robust are those filters when faced with a determined attacker?

In this session, we’ll explore a very recent paper from UK AISI introducing Boundary Point Jailbreaking (BPJ), a method for automatically discovering prompts that bypass safety guardrails, even when the attacker only has black-box access (i.e., they can’t see anything about the model’s internals, just whether a response is blocked or allowed).

We’ll cover:

What “black-box” AI access means
How the proposed automated jailbreak discovery works at a high level
Why boundary-probing strategies are effective
What this might imply for AI safety, governance, and adversarial dynamics

No especially deep technical AI background nor cybersecurity experience required. We’ll focus on building our intuition, understanding real-world parallels to traditional cybersecurity, and then have a discussion on the broader implications.

Expect a short (~15 min) walkthrough followed by open conversation, though we might split into breakout rooms depending on the number of attendees.

Link to paper: https://arxiv.org/abs/2602.15001

Presented by

BlueDot Impact

We’re building the workforce needed to safely navigate AGI.

Contact: [email protected]

Hosted By

AI