

Multi-Agent Evaluation Workshop (In Person, Munich)
Learn to build and evaluate multi-agent AI systems. Using Strands SDK, Amazon Bedrock, and AgentCore, you'll construct a four-agent graph workflow (document analysis, policy retrieval, risk inspection, and claim summarization), then measure its performance through model, operational monitoring and quality evaluation.
📌 Agenda Highlights
Multi-agent orchestration with graph-based routing
CloudWatch dashboards for GenAI metrics (tokens, latency, cost)
Programmatic accuracy testing (Precision, Recall, F1)
LLM-as-Judge evaluation(model evaluation) with 10+built-in and custom evaluators
🎯 This is intended for intermediate Python developers/engineers with AWS experience, interested in building and evaluating AI agent systems. Familiarity with REST APIs and JSON required; ML metrics knowledge helpful but not mandatory.