Cover Image for Live Paper Reading: A Benchmark for Evaluating Outcome-driven Constraint Violations in Autonomous AI Agents

Presented by

Comet provides an end-to-end model evaluation platform for AI developers, with best in class LLM evaluations, experiment tracking, and production monitoring

Hosted By

145 Went

AI

Live Paper Reading: A Benchmark for Evaluating Outcome-driven Constraint Violations in Autonomous AI Agents

Comet

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Join us for our monthly live and interactive paper reading session!

Ready to dive into the fascinating world of AI? Join Abby Morgan for an engaging session in our Opik Virtual Learning Series!

On March 17th, we’re delving into a new paper. "A Benchmark for Evaluating Outcome-driven Constraint Violations in Autonomous AI Agents".

This paper covers ODCV-Bench, a new benchmark for measuring outcome-driven constraint violations—cases where autonomous agents, under KPI/performance pressure, choose multi-step actions that violate ethical, legal, or safety constraints in realistic settings. It introduces 40 production-like scenarios with paired “mandated” vs “incentivized” variants to separate obedience to harmful instructions from emergent misalignment under incentives. Across 12 frontier LLMs, the authors find violation rates ranging from ~1% to ~71%, and report that stronger reasoning does not reliably imply safer behavior, including evidence of “deliberative misalignment” where models recognize an action is unethical yet do it anyway.

Link to the original paper: https://arxiv.org/abs/2512.20798

Presented by

Comet

Comet provides an end-to-end model evaluation platform for AI developers, with best in class LLM evaluations, experiment tracking, and production monitoring

Hosted By

145 Went

AI