Future AGI

AI agents are powerful, but knowing whether they are actually doing the right thing is a very different challenge. In this workshop, Future AGI will walk through practical ways to evaluate agent behavior using open source tools and methods, with a focus on finding what works, what breaks, and where the risks show up.

You will get a hands-on look at how to assess agent performance across real scenarios, from task success and reliability to consistency and failure modes. Whether you are building your first agent or refining an existing system, this session is designed to help you think more clearly about evaluation so you can ship agentic applications with more confidence.

Workshop: Evaluate AI Agents (with open source)