Cover Image for Live Paper Reading: A Benchmark for Evaluating Outcome-driven Constraint Violations in Autonomous AI Agents

Presented by

Comet is the company behind Opik, the fastest-growing open-source project for agent observability, testing, and optimization - now used by more than 150,000 builders.

Hosted By

144 Went

AI

Live Paper Reading: A Benchmark for Evaluating Outcome-driven Constraint Violations in Autonomous AI Agents

Comet

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Join us for our monthly live and interactive paper reading session!

Ready to dive into the fascinating world of AI? Join Abby Morgan for an engaging session in our Opik Virtual Learning Series!

On March 17th, we’re delving into a new paper. "A Benchmark for Evaluating Outcome-driven Constraint Violations in Autonomous AI Agents".

This paper covers ODCV-Bench, a new benchmark for measuring outcome-driven constraint violations—cases where autonomous agents, under KPI/performance pressure, choose multi-step actions that violate ethical, legal, or safety constraints in realistic settings. It introduces 40 production-like scenarios with paired “mandated” vs “incentivized” variants to separate obedience to harmful instructions from emergent misalignment under incentives. Across 12 frontier LLMs, the authors find violation rates ranging from ~1% to ~71%, and report that stronger reasoning does not reliably imply safer behavior, including evidence of “deliberative misalignment” where models recognize an action is unethical yet do it anyway.

Link to the original paper: https://arxiv.org/abs/2512.20798

Presented by

Comet

Comet is the company behind Opik, the fastest-growing open-source project for agent observability, testing, and optimization - now used by more than 150,000 builders.

Hosted By

144 Went

AI