

Spark Connect: NVIDIA Accelerator for Spark SQL and MLlib
Please join us 🤝 to learn more about Apache Spark™, Spark Connect, and Spark ML at NVIDIA.
📅 Date: October 29, 2025
⏰ Time: 9:30 AM - 10:30 AM PST (45min talk, then Q&A)
📍 Location: online (live streaming to LinkedIn, X & YouTube)
Agenda:
Welcome and Introductions
Talk: GPU Accelerated Apache Spark™ Connect: NVIDIA Accelerator for Spark SQL and MLlib
Q&A
Talk: GPU Accelerated Apache Spark™ Connect: NVIDIA Accelerator for Spark SQL and MLlib
Abstract:
Spark Connect, first included in Apache Spark™ 3.4 and recently extended to MLlib in Spark 4.0+, introduced a new way to run Spark applications over a gRPC protocol. This has many benefits, including easier adoption for non-JVM clients, version independence from applications, and increased stability and security of the associated Spark clusters.
In this talk, we shall demonstrate how the recent Spark Connect extension for ML, together with Spark SQL’s existing plugin interface, can be used with NVIDIA GPU-accelerated open source plugins for ML and SQL to enable no-code change, end-to-end GPU acceleration of Spark applications over Spark Connect, with performance up to 9x at 80% cost reduction.
We will introduce a working pattern for Spark Connect with accelerated ETL and ML for use in lakehouses. We will discuss how such an architecture can be used in practice and provide a few industry use cases.