

Operationalizing AI: the standard for agents you can trust
AI agents are being deployed faster than they can be trusted.
Most evaluation approaches break down in real environments. Static benchmarks miss real workflows. Output-only scoring ignores how decisions are made. Systems that perform well in testing often fail under real conditions.
In our inaugural "Agentic In Action" series, Christopher Sniffen (Federal Applied AI Lead at Snorkel AI; former NSA AI lead) shares a practical approach to defining, measuring, and improving AI agent performance in production. He will walk through how leading organizations establish what “good” looks like, evaluate full agent behavior, and build continuous feedback loops that drive measurable improvement.
The session draws on real deployments across government and regulated industries, with a focus on moving from experimentation to operational confidence.