

Apache Arrow / Parquet - June 2026 meetup in Paris
Details
We’re excited to announce the first ever Apache Arrow and Parquet meetup in Paris! This meetup will be hosted on June 18th by Datadog, in their offices, 21 Rue de Châteaudun 75009 Paris.
Arrow and Parquet are two widely-used, open source, language-agnostic data formats for efficient representation of columnar data. They are often complementary: Arrow for in-memory data, Parquet for disk storage. If you’re using Arrow or Parquet, looking for insights, or wanting to meet other community members, this meetup is for you.
The meetup will feature:
🎤 Five talks by Arrow and Parquet community members
🤝 Some time for informal discussion and networking
🍕 Drinks and food
Schedule
The meetup will start at 6:30 pm and will last until 9:30 pm.
Important
Please register if (and only if) you plan to show up, so that we can maximize occupation given a limited number of seats.
Please show up at the reception desk (21 Rue de Châteaudun) at the ground floor. An agent will welcome you there and check your identity. Please have the QR code sent by Luma and/or an ID with you (passport or EU identity card).
Talks
Talks will be roughly 15 minutes each, with 5 more minutes for questions and answers. All talks will be in English.
📌 Zero-Materialization Merging: Concatenating Parquet Files via REE-Encoded Arrow
By Damien Profeta (Datadog)
Traditional compaction is often too CPU-intensive to solve the “small file problem” at scale. This talk introduces Zero-Materialization Merging, which bypasses the costly decode-and-re-encode cycle by concatenating Parquet files through direct Run-End Encoded (REE) Arrow manipulation.
📌 The Sparrow ecosystem: Arrow in modern minimal C++20
By Alexis Placet and Johan Mabille (QuantStack)
The Sparrow ecosystem is a collection of open source projects implementing Apache Arrow specifications in modern C++20, from the columnar format and extension types to companion libraries for IPC and Python interoperability.
📌 GDAL: integrating columnar formats into a row-oriented framework
By Even Rouault (Spatialys)
This talk explores how the open source Geospatial Data Abstraction Library (GDAL) has incorporated columnar data access through Arrow and Parquet. It highlights the benefits of this approach, as well as the limitations and challenges involved in bridging columnar and row-based paradigms.
📌 Efficient data storage with deduplication and Parquet
By Quentin Lhoest (Hugging Face)
The story behind Hugging Face Storage Buckets. We will talk about deduplication with Content-Defined Chunking, and how to enable deduplication for Parquet with open source tools.
📌 Arbalister: instantly open Parquet files in JupyterLab
By Antoine Prouvost (QuantStack)
Parquet is a file format standard, yet inspecting a Parquet file in JupyterLab requires starting a notebook and writing data queries. Arbalister is an open source JupyterLab extension that lets you immediately view the contents of a Parquet file in a memory-efficient way, just by double-clicking on the file.