Unlocking GPU Performance with CUDA Tile
Join Stephen Jones, one of the inventors and foremost experts in CUDA, for a live discussion about tile-based programming on the GPU using CUDA Tile, one of the most innovative additions to CUDA since its inception. You’ll learn how tile kernels abstract away special-purpose hardware like tensor cores, helping you write code that’ll be compatible with future NVIDIA GPU architectures.
In this session, Stephen will cover:
How tile programming simplifies kernel authoring compared to the traditional SIMT (single-instruction multiple-thread) model.
Introduction to cuTile Python and how it enables writing tile kernels in Python, focusing on dividing arrays into tiles that can be operated on in parallel, abstracting away low-level compiler and runtime tasks like block-level parallelism, memory movement, and hardware feature usage.
Using cuTile Python for data-parallel workloads, especially AI and ML applications.
Live Q&A focusing on the new tile-programming paradigm and how to unlock new CUDA capabilities.
