

Who Watches the Bots? Observability, Testing, and Trust for Agentic AI with MLflow
Agenda:
6:00 - 6:30 PM - Welcome and mingle
6:30- 6:45 PM - Introductions
6:45 - 7:30 PM - Talk
7:30 - 8:00 PM - Wrap up
Description:
IAI coding agents can now scaffold entire applications from a single prompt. But for mission-critical systems that require compliance, reliability, and auditability, "it works on my machine" isn't enough. When your agent makes fifty tool calls, rewrites three files, and confidently hallucinates a security vulnerability, how do you even begin to debug that?
This talk introduces a practical, open-source toolkit for injecting observability, control, and systematic testing into agentic AI development. Using MLflow's tracing and evaluation framework, we'll follow a single throughline: building an AI agent with a coding assistant (Claude Code), tracing both the assistant's work and the agent it produces, then evaluating quality with LLM judges - closing the loop from development through production monitoring.
You'll walk away with a concrete workflow for turning the black box of agentic coding into something you can inspect, measure, and trust. No vendor lock-in required.
Speaker Bio:
Tim Lortz is an AI Product Specialist at Databricks, where he has worked since 2019. He currently serves as the technical lead for Databricks' go-to-market efforts for AI in regulated industries, focusing particularly on enterprise-grade governance for AI platforms.
Prior to Databricks, Tim spent many years in various data science practitioner and leadership roles in the Federal contracting space. He holds a PhD in Industrial & Operations Engineering from the University of Michigan (Go Blue!!) and a BS in Industrial Engineering from the University of Pittsburgh.