Cover Image for America’s Next Top Modeler: Benchmarking AI Agents with Bryan Bischof (Theory Ventures)
Cover Image for America’s Next Top Modeler: Benchmarking AI Agents with Bryan Bischof (Theory Ventures)
Avatar for Vanishing Gradients Livestreams

America’s Next Top Modeler: Benchmarking AI Agents with Bryan Bischof (Theory Ventures)

YouTube
Registration
Welcome! To join the event, please register below.
About Event

A Deep Dive into Agent Evaluation, Engineering Philosophy, and the "Framework Tax" with Bryan Bischof.

Bryan Bischof  (Head of AI, Theory Ventures) recently ran a hackathon that wasn’t about demos: the goal was to build an agent that was objectively useful. 

All agents were judged on pre-defined evals that solving proves value, the hackathon emphasised data science principles applied to debug and optimize modern AI systems, and the focus was on engineering depth, not demo-ability.

The result? Even the world’s best engineers struggled to move the needle.

Join Hugo Bowne-Anderson and Bryan Bischof for a technical, blunt, and wide-ranging conversation on the reality of building AI in production and what they found out at the hackathon.

What We’ll Cover

  • The America’s Next Top Modeler Benchmark: Why the median score at a world class hackathon was only 23 out of 65 and what that tells us about the capability gap in modern models.

  • The Over Engineering Trap: A post mortem on why world class engineers often try to build the perfect framework instead of solving the task. Learn how to avoid this in your own stack.

  • MCP as a Distribution Play: Why Bryan went from an MCP skeptic to a convert. It is about the only way to scale AI tools across an entire organization.

  • Sub agents versus Skills: A reality check on agentic architectures. Are sub agents actually useful or are they just poorly defined skills and tool calls?

  • The Get it Working Good Then Fast Philosophy: Bryan’s strategy for building internal tools at a VC firm without getting bogged down in API churn.

Whether you’re an AI engineer struggling with "vibe-based" evaluation or a leader trying to separate agentic hype from reality, this conversation will help you understand how to (and how to not) build reliable AI systems.

Register to join live or get the recording afterwards.

Avatar for Vanishing Gradients Livestreams