Cover Image for The AI Capacity Crunch: A Builder’s Playbook for Latency, Lock-in & Cost

Hosted By

The AI Capacity Crunch: A Builder’s Playbook for Latency, Lock-in & Cost

Name: The AI Capacity Crunch: A Builder’s Playbook for Latency, Lock-in & Cost
Start: 2025-10-21T17:15:00.000-04:00
End: 2025-10-21T20:00:00.000-04:00
Location: New York, New York

Hosted by VentureBeat

New York, New York

Past Event

Please click on the button below to join the waitlist. You will be notified if additional spots become available.

You will be asked to verify token ownership with your wallet.

About Event

Cloud regions are filling up. Latency is creeping higher. And every new model generation demands more compute than the last.

This session is built for the people who have to make AI actually run — the builders designing systems that stay fast, portable, and affordable under real-world pressure. We’ll break down the architecture patterns that matter now: model routing that keeps latency low, storage and I/O strategies that avoid lock-in, and token-spend controls that keep costs predictable as agentic workloads scale.

You’ll see two contrasting paths — Wonder’s cloud-first approach navigating capacity limits, and Recursion’s hybrid infrastructure squeezing every cycle from on-prem GPUs and file systems. You’ll leave with practical templates for multi-region readiness, eval-gated fallbacks, and contract structures that lock in stability while keeping experimentation flexible.

Beyond the discussion, this salon offers the rare chance to connect directly with the people shaping AI infrastructure — founders, CTOs, and architects comparing notes on what’s working, what isn’t, and where the next performance breakthroughs will come from.

The On-Prem Power Play: Ben Mabey, CTO of Recursion, helped build one of pharma’s largest supercomputers. His team bet on GPUs when everyone else went all-in on the cloud — and never looked back. Those same machines, purchased in 2017, still deliver. Ownership became their advantage.
The Cloud-Native Cost Paradox: James Chen, CTO of Wonder, is building a cloud-first AI ecosystem connecting Grubhub, Blue Apron, Relay, and Tastemade. But as capacity tightens, new costs emerge — from regional migrations to unpredictable token bills. For teams generating millions of AI-assisted lines of code a day, efficiency isn’t optional. It’s survival.
Val Bercovici, Chief AI Officer at WEKA and founding member of Kubernetes, works with hyperscalers, neoclouds, and enterprises like Novartis running massive AI workloads. He's seen the math from both sides: petabyte-scale training workloads where parallel file storage economics determined the on-prem decision, and a future where token bills spiral from "sending the same 10,000 tokens every single request." Val will break down the true unit economics of AI—from watts per inference to I/O performance driving cloud costs—and show why memory efficiency and context caching at the storage layer determine whether your reasoning models and agents become competitive advantages or budget killers.

The decisions that matter

CapEx or OpEx. When does owning outperform renting?
Build or Buy. Where does abstraction start costing more than it saves?
Efficiency or Scale. When every token carries a price, what’s the lever that bends the curve?

Not a panel. Not a pitch.

A clear-eyed look — and a rare opportunity to compare notes with the people building the next generation of AI systems.

Location

Please register to see the exact location of this event.

New York, New York

Hosted By