Cover Image for Inference Time Compute Optimization Workshop

Presented by

We host technical, founder-focused events on building production AI systems. Topics include Small Language Models (SLMs), inference-time compute optimization, model routing, and cost-efficient deploym

Hosted By

AI

Inference Time Compute Optimization Workshop

Neurometric AI Events

Google Meet

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Inference-time compute optimization is about making AI systems cheaper, faster, and more scalable in production. While most of the industry focuses on training bigger models, the real constraint for startups is what happens after deployment, every prompt, every token, every millisecond adds up.

In this workshop, we’ll break down how to design systems that minimize inference cost without sacrificing performance.

We’ll cover practical strategies like model routing between small and large models, quantization, batching, caching, distillation, and when to fine-tune versus prompt engineer. This session is built for founders and engineers shipping AI products who want to understand the economics behind their architecture decisions and build systems that scale sustainably.

Presented by

Neurometric AI Events

We host technical, founder-focused events on building production AI systems. Topics include Small Language Models (SLMs), inference-time compute optimization, model routing, and cost-efficient deploym

Hosted By

AI