Cover Image for NYC Open Lakehouse Meetup: Iceberg Edition
Cover Image for NYC Open Lakehouse Meetup: Iceberg Edition
Avatar for NYC Data Lakehouse Events
Events in NYC about the data Lakehouse.
89 Going

NYC Open Lakehouse Meetup: Iceberg Edition

Registration
Welcome! To join the event, please register below.
About Event

📅 Date: November 11, 2025
📍 Venue: 307 West 38th St, Suite 1505, New York, NY 10018
🔗 Event Page: Coming Soon

Join us in the heart of New York City for an evening of community, learning, and networking at the NYC Open Lakehouse Meetup: Apache Iceberg Edition. This meetup brings together data practitioners, engineers, and open-source enthusiasts to explore the rapidly evolving world of data lakehouses, with a spotlight on Apache Iceberg.

Organized by leading voices in the ecosystem—Dremio, Treeverse/LakeFS, RisingWave, and Ryft—this event is designed to spark discussion, share best practices, and showcase how open technologies are shaping the future of data platforms.

Tentative Schedule

  • 6:00 – 6:30 PM | Doors Open & Networking (Free book raffle sign-up)

  • 6:30 – 8:00 PM | Talks & Presentations (Titles and descriptions coming soon)

  • 8:00 – 8:30 PM | Networking, Raffle, and Wrap-Up

Whether you’re already working with Iceberg or just curious about the open lakehouse ecosystem, this is a great opportunity to connect with peers, hear from experts, and take part in building the future of data.

---TALKS (15 Minutes Each)---

lakeFS

Speaker: Oz Katz, Co-founder and CTO, lakeFS

Title: Version Control Beyond Tables: Managing Iceberg and Multimodal Data with lakeFS

Description: 

Apache Iceberg has transformed how we manage structured data - bringing schema evolution, atomic commits, and reliable time travel to data lakes. Yet most modern data workflows span far beyond tables: pipelines mix structured Iceberg datasets with unstructured files, ML models, configurations, and logs.

In this session, we’ll explore how lakeFS extends Iceberg’s foundation to provide consistent version control and governance across all data modalities. With lakeFS, teams can manage Iceberg tables and other assets in a single versioned repository and use Pull Requests to promote data safely through environments.

We’ll also cover how automated hooks and data-quality checks can run before merges - ensuring that every data change meets validation criteria before reaching production. The result: durable, auditable, and reproducible data operations that scale across analytics, ML, and governance use cases - all while preserving Iceberg’s familiar model for structured data.

Ryft

Speaker: Yuval Yogev

Title: Streaming with Iceberg: From Zero To Hero

Description:

Streaming data into Iceberg is gaining traction in modern data platforms, but it brings its own set of challenges that go beyond the usual batch processing problems. In this talk, we’ll dive into the best practices and advanced tips for building reliable and efficient streaming pipelines with Iceberg.

We’ll cover some of the trickier aspects of streaming, like dealing with the constant creation of small files and how Iceberg’s architecture can amplify their impact on performance and storage. You’ll learn practical ways to address these issues, such as optimizing partitioning and sorting, fine-tuning write configurations, managing the cost and complexity of compaction in high-throughput scenarios, and handling late-arriving data. We’ll also look at challenges like working with large numbers of manifests, keeping query planning times and performance consistent.

Using real-world examples and practical insights, this session will give you the tools and knowledge to tackle these challenges and build efficient, cost-effective pipelines. Whether you’re scaling an existing streaming platform or building a new one with Iceberg, you’ll leave you with actionable takeaways to help you build a robust streaming platform.

RisingWave

Speaker: Rayees Pasha, CPO at RisingWave Labs

TItle: The Equality Delete Problem in Apache Iceberg

Description:Deletes in Apache Iceberg are more complex than they appear. Position deletes work well when row locations are known, but they don’t suit streaming CDC workloads where data changes continuously. Equality deletes address that by filtering rows on a key, however, they introduce their own issues: delete files can pile up, queries may slow down, and engine support varies widely.

In this talk, we’ll break down how equality deletes function, why they cause headaches, and what engineers can do about them. We’ll cover hybrid approaches that combine position and equality deletes, how compaction keeps queries fast, and what it takes to run CDC pipelines into Iceberg smoothly in production.

Dremio

Speaker: Alex Merced

TItle: The State of Discussions over Apache Iceberg v4 So Far
Description: Apache Iceberg has entered a pivotal stage in its evolution, with the community now deep in discussions around what will become the v4 specification. This talk provides an overview of those conversations so far, examining the motivations behind v4, scalability, metadata efficiency, and operational simplicity, and how they build on the foundations laid in v3. It highlights emerging proposals such as relative path metadata, improved column-level statistics, and new scan planning endpoints designed to reduce metadata overhead and enhance real-time performance. The session also explores key debates around migration strategies, backward compatibility, and integration across engines and catalogs. Attendees will gain a clear understanding of where consensus is forming, what challenges remain, and how v4 may reshape Iceberg’s role as the open standard for the data lakehouse in the years ahead.

Location
307 W 38th St
New York, NY 10018, USA
Avatar for NYC Data Lakehouse Events
Events in NYC about the data Lakehouse.
89 Going