Cover Image for MLOps Reading Group Oct – LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Presented by

Every month, we meet over Zoom to discuss a paper or book chapter focused on MLOps. To receive updates, please subscribe to this calendar. If you have any questions, email: [email protected]

Hosted By

119 Went

IA

MLOps Reading Group Oct – LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

MLOps Reading Group

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

As AI agents become more capable, their real-world performance increasingly depends on how well they can coordinate tools.

This month's paper:

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

introduces a benchmark designed to rigorously test how AI agents handle multi-step tasks using the Model Context Protocol (MCP) — the emerging standard for tool integration.

The authors present 101 carefully curated real-world queries, refined through iterative LLM rewriting and human review, that challenge models to coordinate multiple tools such as web search, file operations, mathematical reasoning, and data analysis

What we’ll cover:

How LiveMCP-101 benchmarks real-world tool use and multi-step reasoning
Insights from the paper’s experiments and error analysis
Key failure modes in current agent architectures
Practical lessons for building more reliable, MCP-enabled systems

📅 Date: October 23rd

🕚 Time: 11amET

Speakers:

David DeStefano (Lead Staff Software Engineer (Data/ML/AI Platform @EvolutionIQ)

Sophia Skowronski : Data Scientist, Breckinridge Capital Advisors

Valdimar Ágúst Eggertsson: AI Development Team Lead, Snjallgögn (Smart Data Inc.)

Moderator
Arthur Coleman: CEO, OnlineMatters Inc.

Join the #reading-group channel in the MLOps Community Slack to connect before and after the session.