

Containment Verification: AI Safety Guarantees Independent of Alignment
Containment Verification: AI Safety Guarantees Independent of Alignment
Royce Moon – Enclave Intelligence
AI safety today is overwhelmingly empirical, using approaches like red teaming, chain-of-thought monitoring, control evaluations, and interpretability that remain conditional on properties of model behavior or internals. This talk presents containment verification, an approach for obtaining universal, capability-agnostic guarantees over the modeled action boundary by verifying the agentic framework through which AI systems act, rather than the model itself. We model AI as an unconstrained oracle over typed actions, so the proof applies whether the system is narrow, general, superintelligent, aligned, misaligned, or adversarial. The result is auditable fail-safe infrastructure for agentic AI, independent of fragile assumptions about model behavior or internal dynamics.
Paper: https://arxiv.org/abs/2605.09045
Guaranteed Safe AI seminars
The monthly seminar series on Guaranteed Safe AI brings together researchers to advance the field of building AI with high-assurance quantitative safety guarantees.