

Agentic + AI Observability Meetup SF
Join us for Agentic + AI Observability meetup on Tuesday. February 17 from 5pm - 8pm PST at the Databricks SF office, an evening focused on agentic architectures and AI observability: how to design, ship, and monitor AI agents that actually work in production.
This meetup is built for engineers, ML practitioners, and AI startup founders who are already experimenting with agents (or planning to) and want to go deeper into the tech. We’ll cover real-world patterns, failure modes, and tooling for building reliable agentic systems in the broader open-source ecosystem.
Whether you’re at an early-stage startup or an established company, if you care about getting AI agents into production, and keeping them healthy, this meetup is for you.
Why you should attend
See real architectures: Learn how teams are designing agentic systems on top of data/feature platforms, retrieval, and tools, not just calling a single LLM endpoint.
Learn how to observe what agents are doing: Go beyond logs and dashboards to structured traces, evals, and metrics that help you understand and improve agent behavior over time.
Get hands-on with MLflow and observability tools: Watch live demos of MLflow, tracing integrations, and evaluation workflows for agentic systems.
Connect with other builders: Meet engineers, founders, and practitioners working on similar problems, swap patterns, and find collaborators and potential hires.
Agenda
5:00pm: Registration/Mingling
6:00pm: Welcome Remarks by Jules Damj, Databricks, Staff Developer Advocate
6:15pm: Talk #1 - Building Trustworthy, High-Quality AI Agents with MLflow
6:45pm: Talk #2 - Evaluating AI in Production: A Practical Guide
7:15pm: Mingling with bites + dessert
8:00pm: Night Ends
Speakers
Databricks
Staff Software Engineer
Braintrust
Head of Data & Product Growth
Session Descriptions
Building Trustworthy, High-Quality AI Agents with MLflow
Building trustworthy, high-quality agents remains one of the hardest problems in AI today. Even as coding assistants automate parts of the development workflow, evaluating, observing, and improving agent quality is still manual, subjective, and time-consuming.
Teams spend hours “vibe checking” agents, labeling outputs, and debugging failures. But it doesn’t have to be this slow or tedious. In this session, you’ll learn how to use MLflow to automate and accelerate agent observability for quality improvement, applying proven patterns to deliver agents that behave reliably in real-world conditions.Key Takeaways and Learnings
Understand the development lifecycle of Agent development for better observability
Use MLflow key components along the development lifecycle to enhance general observability: tracking and debugging, evaluation with MLflow judges, and a prompt registry for versioning
Select appropriately from a suite of over 60+ built-in and custom MLflow judges for evaluation, and use Judge Builder for automatic evaluation.
Use MLflow UI to compare and comprehend evaluation scores and metrics
Evaluating AI in Production: A Practical Guide
Evaluations are essential for shipping reliable AI products, but many teams struggle to move beyond manual testing. In this talk, I'll walk through how to build a production-ready evaluation framework — from choosing the right metrics and creating effective test cases to setting up continuous evaluation pipelines that catch issues before your users do. You'll walk away with practical patterns you can apply right away.