Inference Performance as a Competitive Advantage
Overview
This session provides an in-depth introduction to FriendliAI and explores how optimized AI inference can become a strategic differentiator for businesses deploying generative AI at scale. Attendees will learn about the critical role inference performance plays in production AI systems where 80-90% of GPU resources are dedicated and discover practical techniques for achieving faster response times, lower costs, and seamless scalability.
Session Agenda:
Introduction to FriendliAI and the AI Inference Landscape
Why Inference Performance Matters: Speed, Cost, and Scale
Demonstration of the FriendliAI Suite
Real-World Use Cases and Customer Success Stories
Q&A and Discussion
Key Takeaways / Learning Outcomes
Attendees will walk away with:
A clear understanding of why inference optimization is critical for production AI applications
Knowledge of techniques that can reduce inference costs by up to 90% while boosting response times
Insights into how continuous batching, speculative decoding, and smart caching accelerate LLM serving
Practical guidance on deploying high-performance inference infrastructure at scale
Understanding of how to turn inference performance into a competitive business advantage
Who should join?
This session is designed for ML/AI Engineers, MLOps practitioners, and technical teams building and deploying generative AI applications in production environments.
Speakers
Speaker 1: Yunmo Koo (Founding Engineer, FriendliAI)
Speaker 2: Alex Campos (GTM Leader, FriendliAI)
Moderator: Rishav Hada (Applied Scientist, Future AGI)
About FriendliAI
FriendliAI is a generative AI infrastructure company founded in 2021, specializing in high-performance LLM inference. Their flagship product, Friendli Inference Engine, delivers up to 90% cost reduction and 2x+ faster inference through proprietary optimizations including continuous batching (pioneered in their OSDI 2022 Orca paper), speculative decoding, and custom GPU kernels.
About Future AGI
Future AGI is a San Francisco-based advanced AI Engineering & optimization platform designed to streamline experimentation, evaluation, optimization and real-time observability. Traditional AI tools often rely on guesswork due to gaps in data generation, error analysis, and feedback loops. Future AGI eliminates this uncertainty by automating the data layer with multi-modal evaluations, agent optimisations, observability, and synthetic data tools, cutting AI development time by up to 95%.
🌐 Follow us on LinkedIn to get the latest updates on events and new launches.