

GTC 2026 | Workshop 09:LLMs & On-Device AI Architecture
💡 Workshop focus on:
When inference cost becomes a ceiling for growth, your architecture is your competitive advantage. This session focuses on strategies for cloud and on-device model co-ordination: when to run on-device, how to split the workload, and techniques like caching, compression, and routing to balance experience, privacy, and cost.
Workshop format: 1-to-10 enablement session
🚀 What You’ll Learn:
Understand the capabilities and trade-offs of cloud vs. on-device AI and common architectural patterns
Master the key levers for inference efficiency: latency, throughput, cost, and energy consumption
Design layered invocation strategies: small-model-first, routing, fallback, and caching
🤖 Who It’s For:
AI engineering leads, architects, and product/tech owners
Teams building agents, developer tools, mobile AI, or smart hardware
Projects hitting a wall with inference cost or latency