

Inside OpenAI's Real-Time Access Engine
Provably Correct, Impossibly Fast
In February 2026, OpenAI published "Beyond Rate Limits" - a deep dive into how they rethought access control for Codex and Sora. Instead of choosing between rate limits and usage-based billing, they built a hybrid real-time system that makes a per-request decision: how much is allowed, and from where?
They called it the "decision waterfall."
In this session, Jonah Cohen - Tech Lead for Financial Engineering at OpenAI and the architect behind that system - joins Stigg for a candid look at what it took to build it, why off-the-shelf solutions fell short, and what he'd tell engineering teams facing the same problem today.
We'll cover:
Why "metering" and "decisioning" are fundamentally different problems
How OpenAI fuses rate limits, credits, and entitlements in a single synchronous request path
The architecture trade-offs behind provably correct billing at scale
What this means for every AI company shipping usage-based products
Whether you're building your own access engine or evaluating infrastructure to handle it, this is the most honest conversation you'll hear about what it takes to get real-time usage control right.