AI x Cyber Reading Group

BlueDot Impact

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

This week we are discussing a very recent paper from UK AISI on measuring how far AI agents can go in realistic, multi-step cyber attack scenarios. Instead of toy tasks, the paper drops models into simulated enterprise and ICS environments and evaluates how well they can execute long attack chains end-to-end. The results show clear progress with scaling and newer models, but also highlight major limitations: performance is still partial, highly dependent on token budgets, and tested in environments without active defenses or defenders. In other words, it’s a great step toward realism, but still far from representing real-world operations. As you read, it’s worth thinking about what the biggest gaps are before capabilities becomes operationally meaningful. Are we "there" yet? If not, what would it take and how would we know?

Link to paper: https://www.aisi.gov.uk/research/measuring-ai-agents-progress-on-multi-step-cyber-attack-scenarios

Presented by

BlueDot Impact

We’re building the workforce needed to safely navigate AGI.

Contact: [email protected]

Hosted By

AI