

Datadog and AWS Workshop: Evaluating and Iterating on AI Agents
Join Datadog and AWS for a hands-on workshop (bring your laptop!) on how to evaluate and iterate on LLM applications and agentic systems powered by Amazon Bedrock.
This session is designed for AI practitioners or developers.
You’ll leave with practical techniques to troubleshoot issues in production, test ideas, validate improvements, and prevent regressions. You’ll make evidence-based design choices using Amazon Bedrock’s unified API and production-ready features, enabling you to improve reliability, cost efficiency, and user experience across LLM applications and agentic workflows.
What You'll Build With:
Datadog LLM Observability: Unified tracing, experimentation, and evaluation to accelerate development, reduce costs, and safeguard quality across the entire agent development lifecycle.
Amazon Bedrock: AWS's fully managed foundation model service, providing secure, scalable access to Claude models
What You'll Learn:
You will work through the key stages of the LLM development workflow:
Monitor: Trace prompts, responses, and agent steps to see how Bedrock models behave. Track latency, tokens, and errors to find bottlenecks.
Iterate: Test prompts, models, and configurations to compare accuracy, cost, and performance, using real traces from production.
Evaluate: Run built-in and custom checks to catch hallucinations, drift, unsafe responses, PII leaks, and injection attempts across your Bedrock-powered agents.
Optimize: Select the best Claude model for each agent by weighing accuracy, speed, cost, and security. Learn how to balance Bedrock’s powerful models with real world constraints.
Who should attend:
This workshop is designed for AI engineers, ML practitioners, data scientists and developers building and running LLM applications and agentic systems.
Schedule (WIP)
5:30 PM – Networking, Food & beverages provided
6:00 PM – Intro to AWS Bedrock
6:15 PM – Intro LLM Observability
6:30 PM – Hands-on lab
7:30 PM – Networking
Speakers
Charles Jacquet
Charles Jacquet is a Product Manager at Datadog, where he leads LLM and AI Agent Evaluation. He focuses on helping teams run experiments, build datasets, and assess the reliability and safety of AI applications. Before Datadog, Charles was a Machine Learning Engineer then PM at Gorgias, where he launched an LLM-powered QA system.