Containment Verification: AI Safety Guarantees Independent of Alignment

Guaranteed Safe AI Seminars

Zoom

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Containment Verification: AI Safety Guarantees Independent of Alignment
Royce Moon – Enclave Intelligence

AI safety today is overwhelmingly empirical, using approaches like red teaming, chain-of-thought monitoring, control evaluations, and interpretability that remain conditional on properties of model behavior or internals. This talk presents containment verification, an approach for obtaining universal, capability-agnostic guarantees over the modeled action boundary by verifying the agentic framework through which AI systems act, rather than the model itself. We model AI as an unconstrained oracle over typed actions, so the proof applies whether the system is narrow, general, superintelligent, aligned, misaligned, or adversarial. The result is auditable fail-safe infrastructure for agentic AI, independent of fragile assumptions about model behavior or internal dynamics.

Paper: https://arxiv.org/abs/2605.09045

Guaranteed Safe AI seminars

The monthly seminar series on Guaranteed Safe AI brings together researchers to advance the field of building AI with high-assurance quantitative safety guarantees.

Presented by

Guaranteed Safe AI Seminars

Monthly seminars on Guaranteed Safe AI R&D. https://www.horizonomega.org/p/guaranteed-safe-ai

Hosted By

5 Going

AI

Containment Verification: AI Safety Guarantees Independent of Alignment

​Guaranteed Safe AI seminars

Guaranteed Safe AI seminars