Cover Image for Live Paper Reading: A Benchmark for Evaluating Outcome-driven Constraint Violations in Autonomous AI Agents
Cover Image for Live Paper Reading: A Benchmark for Evaluating Outcome-driven Constraint Violations in Autonomous AI Agents
Avatar for Comet
Presented by
Comet
145 Went

Live Paper Reading: A Benchmark for Evaluating Outcome-driven Constraint Violations in Autonomous AI Agents

Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

​​Join us for our monthly live and interactive paper reading session!

​Ready to dive into the fascinating world of AI? Join Abby Morgan for an engaging session in our Opik Virtual Learning Series!

On March 17th, we’re delving into a new paper. "A Benchmark for Evaluating Outcome-driven Constraint Violations in Autonomous AI Agents".

This paper covers ODCV-Bench, a new benchmark for measuring outcome-driven constraint violations—cases where autonomous agents, under KPI/performance pressure, choose multi-step actions that violate ethical, legal, or safety constraints in realistic settings. It introduces 40 production-like scenarios with paired “mandated” vs “incentivized” variants to separate obedience to harmful instructions from emergent misalignment under incentives. Across 12 frontier LLMs, the authors find violation rates ranging from ~1% to ~71%, and report that stronger reasoning does not reliably imply safer behavior, including evidence of “deliberative misalignment” where models recognize an action is unethical yet do it anyway.

Link to the original paper: https://arxiv.org/abs/2512.20798

Avatar for Comet
Presented by
Comet
145 Went