Cover Image for Seattle Apache Iceberg™ Community Meetup
Cover Image for Seattle Apache Iceberg™ Community Meetup
122 Going

Seattle Apache Iceberg™ Community Meetup

Hosted by kevin liu & 4 others
Register to See Address
Kirkland, Washington
Registration
Welcome! To join the event, please register below.
About Event

Seattle Apache Iceberg™ Community Meetup! 🧊❄️🍦

Join us on October 23rd (Thursday) from 5:00-8:30 PM at the Google Kirkland Office.

⚠️ 🚨Registration will be closed on EOD SUNDAY 10/19. 🚨⚠️

Please sign up early!

​Connect with fellow enthusiasts, share insights, and dive into the latest developments in the Apache Iceberg™ ecosystem! Whether you're a seasoned pro or new to Apache Iceberg, this meetup is the perfect place to exchange ideas and spark innovation.

Agenda

​5:00p - 6:00p: Doors Open & Networking 💃

​6:00p - 7:30p: Welcome Remarks & Presentations!

​7:30p - 8:30p: More Networking 🕺

The event will focus on innovations in Apache Iceberg (https://iceberg.apache.org/)

We will discuss topics around Open-Source Data Analytics, Open Table Formats (OTF), software concepts like Transactional Data Lakes or Lakehouse, advancements in AI/ML including generative AI, and many more topics of mutual interest that leverage Apache Iceberg.

During the sessions, we will provide you tips to get involved within the community, you will learn more about how the community is collaborating to grow the technology, and software/solutions that ease problem solving and improve user experiences.

Presentations

Apache Iceberg V4 Adaptive Metadata Tree

Amogh Jahagirdar (Staff Software Engineer @ Databricks)

The current Apache Iceberg content metadata tree has a manifest list representing the root of each snapshot pointing to data/delete manifests, which in-turn point to different data/delete files on disk; this provides effective pruning and tracking but introduces latency for small commits.

This talk introduces the V4 adaptive metadata tree, a new proposed metadata tree structure that enables single-file commits for small writes while maintaining scalable organization for large tables. By embedding column statistics at all levels in the tree, it enables even more efficient pruning, reduces planning overhead, and further optimizes data/delete file planning. The new structure also makes it possible to identify file additions and removals for change detection in each snapshot without comparing to the previous snapshot.

Starting First Iceberg Table at Uber

Xuanyi Li (Senior Software Engineer @ Uber)

For years, Uber has relied on Spark, Flink, and Presto to power our massive data ecosystem. As we evolve toward a modern Lake House architecture to manage data more efficiently at Uber's scale, the popularity and promise of Apache Iceberg caught our attention. This talk will focus on the experience of onboarding a major production table to Iceberg. We will cover the early successes, the technical challenges we found, and our plans for wider adoption across the platform.

Simplifying Iceberg Ingestion and Table Maintenance

Sida Shen (Product Manager @ CelerData)

Working with Apache Iceberg often means juggling extra services for ingestion pipelines and background compaction. But those workflows don’t have to be so heavy. This talk dives into practical strategies for reducing small-file problems and keeping data immediately queryable, from writing optimally sized files at ingest to triggering compaction only when it’s actually needed. We’ll share benchmark results that highlight what’s possible in open source Iceberg today: up to ~5× faster writes on highly partitioned tables, ~100× fewer small files, and stable performance with no OOMs even with thousands of partitions. We’ll look at how a modern query engine can bring these techniques together, cutting down on extra services while still keeping tables healthy." ​

Sida Shen is a contributor to the StarRocks project and a product manager at CelerData. As an engineer with a background in building machine learning and big data infrastructures, he oversees the company’s market research while working closely with engineers and developers across the analytics industry to tackle challenges related to data lakehouse analytics.

Iceberg Use Cases at DoorDash

Vignesh Chandramohan (Engineering Manager @ DoorDash)

Open table formats like Iceberg enable multiple engines to operate on a single copy of data. Multiple open source and proprietary engines support iceberg, but the feature set they support varies. We will share our experience of using iceberg for a set of use cases in DoorDash, some of the gaps that  we see and upcoming features that we are looking forward to.

Location
Please register to see the exact location of this event.
Kirkland, Washington
122 Going