Mastering High-Performance Generative AI Inference on Amazon Elastic Kubernetes Service (EKS) with Nvidia Run:ai and Outerbounds
Join AWS for an immersive day exploring how to build and scale production-ready Generative AI deployments using Amazon Elastic Kubernetes Service (EKS). Through deep-dive technical sessions and a hands-on workshop, you'll discover how to architect scalable inference pipelines, optimize GPU utilization, and implement efficient model serving strategies on Amazon EKS.
Leading AI innovators including NVIDIA Run:ai and Outerbounds will share their experiences in building state-of-the-art GenAI inference solutions, demonstrating how they've achieved significant performance improvements by building on Amazon EKS. In the hands-on workshop, attendees will get practical experience setting up Amazon EKS clusters optimized for NVIDIA GPU workloads, implementing distributed inference architectures using Ray and vLLM, and establishing comprehensive monitoring and observability.
Whether you're looking to deploy your first language model or serve thousands or millions of inference requests, you'll walk away with practical knowledge on how to build and operate production-grade GenAI infrastructure that delivers high performance, cost efficiency and cost optimization, and enterprise-grade reliability on Amazon EKS.
Join Nvidia's Run:ai and Outerbounds leading innovators in AI for an exclusive evening —
Nvidia's Run:ai — Segment lead, CSP Lior Balan and Manager Solutions Architect Robert Magno
Co-founder and CEO of Outerbounds — Ville Tuulos
You'll also have the opportunity to network with AWS experts, fellow AI/ML practioners, cloud architects, and industry innovators. We'll keep you energized throughout the day with complimentary lunch and refreshments, ensuring you're at your best for learning and collaboration.
Ready to build? Space is limited. Register now to secure your spot.
Who should attend:
AI/ML Engineers and Data Scientists
DevOps and Platform Engineers
Technical Decision Makers
🎯 Please bring a personal laptop for hands-on sessions
NOTE: All attendees MUST bring a physical valid government-issued ID (e.g. driver's license, passport card, passport) to present at check-in. This is mandatory with no exceptions.
09:00 am - 09:30 am PDT: Networking and Breakfast
Breakfast will be provided. Enjoy the break by networking and the live demo booths from Nvidia Run:ai Team.
09:30 am - 10:00 am PDT: Welcome Address
10:00 am - 11:00 am PDT: Optimizing GPU Utilization for Large-Scale GenAI Inference with NVIDIA Run:ai
Learn how to maximize GPU resource efficiency for your GenAI workloads using NVIDIA Run:ai's fractional GPU technology on Amazon EKS. This session will demonstrate how to overcome common challenges in GPU utilization, including static allocation and resource competition in shared environments. Through practical examples, we'll explore how to implement dynamic GPU allocation, priority-based workload sharing, and automated resource management to achieve significant improvements in GPU utilization.
11:00 am - 12:00 pm PDT: Building Differentiated Production-Grade AI Systems with Outerbounds bringing together data, models, and agents on Amazon EKS
As AI moves from demos and prototypes to production systems, it becomes crucial to manage data, code, models, prompts, and agents as a coherent whole. This talk introduces how Outerbounds, an AI platform provider built it's end-to-end, developer-friendly AI stack using open-source Metaflow on AWS. We cover the foundations of scalable compute and container management, then progress to orchestrating large-scale, resilient autonomous agents on Amazon EKS. Finally, we demonstrate a modern CI/CD workflow that unifies offline and online components, enabling versioned data, prompts, and models to be developed and deployed seamlessly as part of a single integrated system.
12:00 pm - 13:00 pm PDT: Lunch Time
Lunch will be provided. Enjoy the break by networking and the live demo booths from Nvidia Run:ai Team.
13:00 pm - 15:30 pm PDT: Hands-on Workshop: Building and Scaling GenAI Inference Workloads with Amazon EKS
In this hands-on workshop, you'll learn to build and deploy scalable GenAI inference pipelines on Amazon EKS. Through guided exercises, you'll set up GPU-optimized EKS clusters, implement efficient model serving strategies using Ray and vLLM, and configure node auto-scaling for dynamic workloads. You'll also learn to establish comprehensive monitoring and observability using Prometheus and Grafana, implement load balancing for distributed inference, and apply best practices for managing GPU resources. You'll walk away with practical experience in deploying production-grade GenAI workloads that can efficiently scale to handle millions of inference requests.