Community Paper Reading: CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

Arize AI

Zoom

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Join our upcoming community paper reading, where we'll dive into "CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities."

We're excited to host several of the paper's authors, who will walk us through the research and its implications. There will be a live Q&A session, so bring your questions!

The paper highlights the urgent need for a real-world benchmark to evaluate the ability of LLM agents to exploit web application vulnerabilities and how existing benchmarks fall short as they are limited to abstracted Capture the Flag competitions or lack comprehensive coverage. Building a benchmark for real-world vulnerabilities involves both specialized expertise to reproduce exploits and a systematic approach to evaluating unpredictable threats. To address this challenge, the paper introduces CVE-Bench, a real-world cybersecurity benchmark based on critical-severity Common Vulnerabilities and Exposures. CVE-Bench leverages a sandbox framework that enables LLM agents to exploit vulnerable web applications in scenarios that mimic real-world conditions, while also providing effective evaluation of their exploits. The evaluation shows that the state-of-the-art agent framework can resolve up to 13% of vulnerabilities.

Presented by

Arize AI

Generative AI-focused workshops, hackathons, and more. Come build with us!

Hosted By

AI