Automating Evals, Scenario's with Claude code, Skills & LangWatch MCP

LangWatch

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Building reliable AI products requires more than prompts and models—it requires systematic testing, evaluation, and iteration. In this webinar, Rogeiro and Sergio will walk through how teams can automate evaluations and scenario testing using Claude Code, Skills, and LangWatch MCP to build more robust AI systems.

The session will explore how both engineers and product managers can design, automate, and scale evaluation workflows that continuously validate AI behavior across real-world scenarios. You'll see how developer tools like Claude Code and Skills can be combined with LangWatch MCP to simulate agent behavior, test edge cases, and monitor performance before releasing AI features to users.

Together, Rogeiro and Sergio will demonstrate practical approaches for turning evaluation from a manual process into an automated system that supports faster experimentation, safer releases, and better AI product quality.

What you'll learn

How to automate AI evaluations using Claude Code and LangWatch
How to design realistic scenarios and simulations for testing AI systems
How engineers can integrate evals into their development workflow
How product managers can define evaluation criteria and measure AI performance
How LangWatch MCP enables scalable testing and monitoring of AI behavior

Whether you're building AI-powered products, managing AI features, or developing agent-based systems, this session will provide practical insights into creating repeatable evaluation workflows that help teams ship AI with confidence.

Presented by

LangWatch

Hosted By

AI