Benchmarks for AI-assisted Formal Verification

Guaranteed Safe AI Seminars

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Benchmarks for AI-assisted Formal Verification
Theodore Ehrenborg – AI Safety researcher at the Beneficial AI Foundation and PIBBSS

LLMs have shown promise at generating formal proofs, which can be automatically verified despite their untrusted origin. The most impressive LLM-generated proofs have been for mathematical theorems, and hence those techniques can't be directly applied to formal verification (i.e. proving that software meets a specification). We present two benchmarks for measuring how well LLMs can formally verify software.

The first benchmark is the largest to our knowledge (7141 examples in Lean, 2334 examples in Verus, and 3029 examples in Dafny). We standardized the format of source benchmarks and used LLM translators to augment it. We measure the success rate when an LLM has to generate and verify code given only the specification: 27% for Lean, 44% for Verus, and 82% for Dafny.

The second benchmark is a corpus of manually-checked specifications in Lean and Verus, based on the production cryptography library https://github.com/dalek-cryptography/curve25519-dalek. Many of these real-world examples are difficult. In some cases it takes days for a human expert to verify that a function meets its specification. In preliminary work using Inspect AI, we find that GPT-5-mini can replicate 17/65 of the Lean proofs.

We encourage others to attempt these benchmarks. If AIs can reduce the cost of formal verification, this will prevent bugs in human-generated and AI-generated software.

Guaranteed Safe AI seminars

The monthly seminar series on Guaranteed Safe AI brings together researchers to advance the field of building AI with high-assurance quantitative safety guarantees.

Presented by

Guaranteed Safe AI Seminars

Monthly seminars on Guaranteed Safe AI R&D. https://www.horizonomega.org/p/guaranteed-safe-ai

Hosted By

51 Went

AI

Benchmarks for AI-assisted Formal Verification

​​​​Guaranteed Safe AI seminars

Guaranteed Safe AI seminars