Cover Image for The AI Capacity Crunch: A Builder’s Playbook for Latency, Lock-in & Cost
Cover Image for The AI Capacity Crunch: A Builder’s Playbook for Latency, Lock-in & Cost
Hosted By

The AI Capacity Crunch: A Builder’s Playbook for Latency, Lock-in & Cost

Hosted by VentureBeat
Register to See Address
New York, New York
Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

Cloud regions are filling up. Latency is creeping higher. And every new model generation demands more compute than the last.

This session is built for the people who have to make AI actually run — the builders designing systems that stay fast, portable, and affordable under real-world pressure. We’ll break down the architecture patterns that matter now: model routing that keeps latency low, storage and I/O strategies that avoid lock-in, and token-spend controls that keep costs predictable as agentic workloads scale.

You’ll see two contrasting paths — Wonder’s cloud-first approach navigating capacity limits, and Recursion’s hybrid infrastructure squeezing every cycle from on-prem GPUs and file systems. You’ll leave with practical templates for multi-region readiness, eval-gated fallbacks, and contract structures that lock in stability while keeping experimentation flexible.

Beyond the discussion, this salon offers the rare chance to connect directly with the people shaping AI infrastructure — founders, CTOs, and architects comparing notes on what’s working, what isn’t, and where the next performance breakthroughs will come from.

  • The On-Prem Power Play: Ben Mabey, CTO of Recursion, helped build one of pharma’s largest supercomputers. His team bet on GPUs when everyone else went all-in on the cloud — and never looked back. Those same machines, purchased in 2017, still deliver. Ownership became their advantage.

  • The Cloud-Native Cost Paradox: James Chen, CTO of Wonder, is building a cloud-first AI ecosystem connecting Grubhub, Blue Apron, Relay, and Tastemade. But as capacity tightens, new costs emerge — from regional migrations to unpredictable token bills. For teams generating millions of AI-assisted lines of code a day, efficiency isn’t optional. It’s survival.

  • Val Bercovici, Chief AI Officer at WEKA and founding member of Kubernetes, works with hyperscalers, neoclouds, and enterprises like Novartis running massive AI workloads. He's seen the math from both sides: petabyte-scale training workloads where parallel file storage economics determined the on-prem decision, and a future where token bills spiral from "sending the same 10,000 tokens every single request." Val will break down the true unit economics of AI—from watts per inference to I/O performance driving cloud costs—and show why memory efficiency and context caching at the storage layer determine whether your reasoning models and agents become competitive advantages or budget killers.

The decisions that matter

  • CapEx or OpEx. When does owning outperform renting?

  • Build or Buy. Where does abstraction start costing more than it saves?

  • Efficiency or Scale. When every token carries a price, what’s the lever that bends the curve?

Not a panel. Not a pitch.

A clear-eyed look — and a rare opportunity to compare notes with the people building the next generation of AI systems.

Location
Please register to see the exact location of this event.
New York, New York
Hosted By