Cover Image for Measuring Agents in Production
Cover Image for Measuring Agents in Production
Hosted By
10 Went

Measuring Agents in Production

Hosted by NICE AI Talk
YouTube
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Welcome to NICE Talk! | Measuring Agents in Production

How to deploy LLM Agents right? An analysis based on our survey of 306 industry practitioners and conducted 20 in-depth case studies across 26 domains through interviews.


Youtube Livestream: https://youtube.com/live/hcQmCWzwXyQ


Abstract:

AI agents are actively running in production across diverse industries, yet little is publicly known about which technical approaches enable successful real-world deployments. We present the first large-scale systematic study of AI agents in production, surveying 306 practitioners and conducting 20 in-depth case studies via interviews across 26 domains. We investigate why organizations build agents, how they build them, how they evaluate them, and what the top development challenges are. We find that production agents are typically built using simple, controllable approaches: 68% execute at most 10 steps before requiring human intervention, 70% rely on prompting off-the-shelf models instead of weight tuning, and 74% depend primarily on human evaluation. Reliability remains the top development challenge, driven by difficulties in ensuring and evaluating agent correctness. Despite these challenges, simple yet effective methods already enable agents to deliver impact across diverse industries. Our study documents the current state of practice and bridges the gap between research and deployment by providing researchers visibility into production challenges while offering practitioners proven patterns from successful deployments.

Invited Speakers:

Melissa Z. Pan is a Ph.D. student in Computer Science at UC Berkeley, advised by Prof. Matei Zaharia. Her research interests lie in building efficient and sustainable computing systems for emerging machine learning and data-intensive tasks (eg: agentic systems) at a large scale, and how to build reliable agents to support systems research. She is currently investigating energy-efficient and reliable agentic/compound AI systems through resource scheduling and cross-stack optimization. Melissa is also Amazon AI Fellow, and Laude AI Resident.

Negar Arabzadeh is a postdoctoral researcher in Computer Science at the University of California, Berkeley, working with Prof. Matei Zaharia. She received her Ph.D. from the University of Waterloo, where she was advised by Dr. Charles L. A. Clarke. Her research lies at the intersection of information retrieval and large language models. She studies how retrieval should be designed, evaluated, and integrated into LLM-based information access systems, and how LLMs can be used both as subjects of evaluation and as evaluators in modern IR pipelines.

Host

Haolun Li, Ph.D. candidate at MILA & Mcgill University

Hosted By
10 Went