

10x Cheaper AI Inference - Office Hours
Running AI in production does not have to mean runaway infrastructure bills. In this office hour, we will break down practical, real world strategies to cut inference costs by up to 10× without sacrificing latency, quality, or reliability.
We will cover how teams are optimizing across the stack, including model selection, batching, caching, hardware choices, and deployment patterns that work in production. This is a hands on session focused on trade offs, decision frameworks, and concrete examples you can apply immediately.
Bring your current setup, cost challenges, or scaling questions. This session is open, interactive, and designed for founders, engineers, and anyone running AI systems at scale.
Perfect if you are shipping AI features to users, hitting scaling or cost ceilings, evaluating open versus closed models, or optimizing inference for production workloads.