South Bay Systems: Innovative Data Systems Research
Welcome to another edition of South Bay Systems! This time we bring you three wonderful talks from authors at the just-finishing Conference in Innovative Data System Research:
Pinar Tözün will present her work on xNVMe.
Maximilian Kuschewski will give a quick intro to his CloudSpecs work, and then discuss out-of-memory query processing work from SIGMOD.
Alexander Baumstark will present his work on using NPUs in DBMSs.
Agenda
6:00 PM: Doors open, food and socializing
6:30 PM — 6:50 PM: xNVMe Talk
6:50 PM — 7:10 PM: Spilling Talk
7:10 PM — 7:30 PM: NPU Talk
7:30 PM onward : Community socializing!
Food and beverages will be provided, courtesy of our hosts, Databricks.
Flexible I/O for Database Management Systems with xNVMe
To leverage the capabilities of modern NVMe SSDs, a variety of I/O paths are available (e.g., libaio, io_uring, and SPDK) today. However, to avoid the ch allenges and unpredictability that comes with writing code to target such diversity, most data systems still choose to rely on the conventional filesyste m APIs and synchronous IO. While (maybe) increasing programmer productivity, this choice leads to sub-optimal utilization of the modern NVMe storage. To make the diverse I/O storage paths more accessible to users, Samsung built xNVMe. This talk will focus on our experience with integrating xNVMe into a state-of-the-art database system, DuckDB, and demonstrate what this integration enables for DuckDB out of the box.
Speaker Bio
Pınar Tözün is an Associate Professor and the Head of Data, Systems, and Robotics Section at IT University of Copenhagen (ITU). Before ITU, she was a research staff member at IBM Almaden Research Center. Prior to joining IBM, she received her PhD from EPFL. Her thesis received ACM SIGMOD Jim Gray Doctoral Dissertation Award Honorable Mention in 2016. Her research focuses on resource-aware machine learning, performance characte rization of data-intensive systems, and scalability and efficiency of data-intensive systems on modern hardware.
Spilling Secrets: SSD Query Processing with Near-In-Memory Performance
What happens when your query engine runs out of memory? Fast in-memory systems crash while disk-based systems crawl along at hard-disk-era speeds. But NVMe SSDs are a game changer: a modern NVMe array can deliver over 100GB/s throughput at a fraction of DRAM's cost -- if your engine can utilize it.
In this talk, I'll show how our prototype query engine, built on the Unified Materialization Management Interface (Umami), runs analytical queries over 10TB of TPC-H data with just 384GB of RAM, spilling terabytes to SSD while retaining 89% of in-memory performance. Umami uses two main techniques: Adaptive Materialization, which allows hash-based operators to decide on-the-fly whether to partition and spill, and Self-Regulating Compression, which optimizes spilling throughput based on current workload and available hardware.
Speaker Bio
Maximilian Kuschewski is a fourth-year Ph.D. student at the Technical University of Munich (TUM). His research interests include high-performance query processing, distributed systems, modern hardware, compression, and programming languages.
Does A Fish Need a Bicycle? The Case for On-Chip NPUs in DBMS
Neural Processing Units (NPUs) are specialized accelerators designed for efficient ML inference and training. Despite their rapid adoption driven by LLMs, DBMS have largely overlooked NPUs, assuming limited benefits beyond ML workloads. This talk challenges that view. Modern consumer NPUs are tightly integrated into CPU packages, offering high-bandwidth, low-latency access to main memory and reduced data movement compared to discrete accelerators like GPUs. These properties make NPUs a promising yet unexplored platform for accelerating core database tasks. In this talk, we evaluate the suitability of NPUs for query processing, indexing, and maintenance, presenting early experimental results on recent NPU-enabled systems. We will discuss performance characteristics, programmability constraints, and system-level trade-offs, providing initial insights into how available NPU architectures can be integrated into DBMS engines.
Speaker Bio
Alexander Baumstark is a research associate and PhD candidate at TU Ilmenau, Germany, in the Databases and Information Systems Group led by Prof. Kai-Uwe Sattler. His research focuses on leveraging emerging hardware technologies, such as modern memory systems (PMem, PIM) and hardware accelerators including GPUs and NPUs, to accelerate query processing in relational and graph DBMSs. In recent work, he investigated how tightly integrated accelerators (Intel DSA, IAA, on-Chip NPUs) can be systematically exploited to improve the performance and efficiency of modern DBMS query execution.