Cover Image for Measuring and Mitigating Agent Cybersecurity Risks, with Eliot Jones (Gray Swan AI)
Cover Image for Measuring and Mitigating Agent Cybersecurity Risks, with Eliot Jones (Gray Swan AI)
Avatar for Cornell AI Alignment
A community of students and researchers conducting research and outreach to mitigate risks from advanced AI systems.
Hosted By
22 Went

Measuring and Mitigating Agent Cybersecurity Risks, with Eliot Jones (Gray Swan AI)

Registration
Past Event
Welcome! To join the event, please register below.
About Event

Talk: Measuring and Mitigating Cybersecurity Risks of State-of-the-Art AI Agents

We are excited to host Eliot Jones, Head of Offensive Security at Gray Swan AI and Co-Lead of Project Trinity, a multidisciplinary study probing cybersecurity risks from advanced AI.

Eliot has previously developed cybersecurity evaluations for pre-deployment frontier model testing; led synthetic data curation for PleIAs models; and contributed to benchmarks like Cybench and D-Rex.​

In his talk, Eliot will discuss approaches to measuring long-term AI security risks within the cybersecurity domain and share insights from his current research. Following the presentation, we’ll open the floor for a discussion and Q&A on pursuing research and careers in AI security.

Dinner will be provided; come enjoy good food and great discussion!

Presentation Abstract:

Measuring the long-term cybersecurity risks posed by AI agents is an incredibly difficult problem. On one hand, lackluster performance by frontier models on the industry standard cybersecurity evaluations would suggest that offensive security capabilities aren't strong enough to be dangerous. On the other hand, there is a massive gap between evaluating the performance of language models versus the performance of state-of-the-art AI agents. Understanding whether we're measuring meaningful risk indicators or merely correlated proxies has significant implications for how we prepare for and respond to potential threats. In this talk, we will examine how to measure and respond to long-term cybersecurity risks from AI agents. First, we will explore and discuss what threat models are convincing here, and what threat models are actually being measured. Second, we will compare current evaluation methods against our expectations for what a "good evaluation" should be. Lastly, we will discuss how to mitigate these risks in practice, examining proposed safeguards and the tradeoffs they introduce.

Location
Malott Hall 203
Avatar for Cornell AI Alignment
A community of students and researchers conducting research and outreach to mitigate risks from advanced AI systems.
Hosted By
22 Went