Cover Image for From Silicon to Solution: Building a World-Class Inference Engine for the AI Builder
Cover Image for From Silicon to Solution: Building a World-Class Inference Engine for the AI Builder
Avatar for GMI Cloud
Presented by
GMI Cloud
2 Went

From Silicon to Solution: Building a World-Class Inference Engine for the AI Builder

Registration
Past Event
Welcome! To join the event, please register below.
About Event

Join GMI Cloud and Yujing Qian (Head of Engineering) for a deep dive into how modern inference systems are evolving in the era of next-gen GPUs.

As new architectures like Blackwell push the limits of compute, many production systems face unexpected bottlenecks in latency, cost, and scaling. This session explores what actually changes at the system level — and how to rethink your inference stack for real-world workloads.


What You’ll Learn

• Why traditional inference assumptions break on next-gen GPU architectures
• How to redesign batching, scheduling, and concurrency strategies
• Key architectural shifts for better latency and cost efficiency at scale
• What real production traffic reveals about inference system behavior


Why Attend

If you're building or scaling AI inference systems, this session will give you a clearer framework for moving from raw compute power → production performance.

Join us live at Booth #142.

Location
San José Convention Center & South Hall
150 W San Carlos St, San Jose, CA 95113, USA
Booth 142, GMI Cloud
Avatar for GMI Cloud
Presented by
GMI Cloud
2 Went