Cover Image for Build, Compare, and Evaluate MCP-powered AI Agents
Cover Image for Build, Compare, and Evaluate MCP-powered AI Agents
Avatar for DataTalks.Club events
DataTalks.Club is a global online community of people who love data.
117 Going

Build, Compare, and Evaluate MCP-powered AI Agents

YouTube
Registration
Welcome! To join the event, please register below.
About Event

In this hands-on session, you’ll create an MCP server using Snowflake Managed MCP, build an AI agent prototype, and connect the agent to the server.

You’ll then evaluate the agent end-to-end with TruLens: analyze results, identify failure modes, and improve the prototype by refining tool calling and tool selection. Finally, you’ll compare the original and improved versions using TruLens traces and evaluation metrics.

Rather than focusing only on agent construction, the workshop highlights how data access, tool design, and observability shape agent performance. You’ll see how relatively small changes, especially in tool definitions, can lead to measurable improvements in tool selection and tool calling.

The session uses a concrete example: a health research agent, grounded on clinical trials and PubMed data available from the Snowflake Marketplace.

What you’ll learn:

  • How to build an AI agent backed by Snowflake-managed MCP servers

  • How agents discover and choose tools through MCP

  • How to design tool descriptions that influence agent behavior

  • How to evaluate agent quality using structured metrics

  • How to compare agent versions using observability and traces

  • Why data grounding matters for reliable agents

What we’ll do:

  • Build an initial agent version connected to Snowflake MCP

  • Evaluate its performance using TruLens metrics

  • Identify failure modes in tool selection and tool calling

  • Improve MCP tool definitions

  • Rebuild and re-evaluate a second agent version

  • Compare both versions side by side using their traces and evaluation data

By the end of the session, you will have a clear, practical understanding of how building, evaluating, and iterating on agents works in practice, and how observability makes agent development more structured and transparent.

Please come prepared with a fresh Python environment (such as Jupyter) to run the lab.

About the speaker:

Josh is a developer advocate for AI and Open Source at Snowflake, previously at TruEra (acquired by Snowflake). He is also a maintainer of TruLens, an open-source library for systematically tracking and evaluating LLM-based applications.

Josh regularly delivers tech talks and workshops at events including PyData, Devoxx, AI_Dev, AI DevWorld, AI Camp meetups, and more. He also developed courses and taught students on a variety of platforms, including Coursera, DeepLearning.ai, Udemy, and DataCamp, and served as an advisor for Trustworthy Machine Learning at Stanford.


DataTalks.Club is the place to talk about data. Join our Slack community!



This post is sponsored by Snowflake.

Avatar for DataTalks.Club events
DataTalks.Club is a global online community of people who love data.
117 Going