Cover Image for Real Time Data Lakes and AI
Cover Image for Real Time Data Lakes and AI
153 Going

Real Time Data Lakes and AI

Hosted by Open Source Analytics Community & 4 others
Registration
Welcome! To join the event, please register below.
About Event

The Open Source Analytics Community is excited to bring an evening of insights and networking to Seattle, featuring experts from Altinity, CelerData, and AWS.

Real-time databases are converging with data lakes to lower storage costs, improve performance, and make data more accessible for AI and data science. Our speakers will share challenges, solutions, and real-world experiences as organizations move from closed storage systems to open formats like Apache Iceberg.

Join us in Seattle to learn how to build high-performance real-time data lake systems. Talks will be followed by networking, with food and drinks provided.

Big thanks to AWS for hosting this event!


​Speakers


​​​Description of the Talks

​​​Building a Foundation for AI with ClickHouse® and Apache Iceberg Storage - Robert Hodges, CEO @ Altinity.

  • ​​​​AI applications need data. Lots of it. Altinity's Project Antalya is adapting open source ClickHouse® to introduce separation of compute and storage on shared Iceberg table data. The result: fast, cheap, flexible query that extends the life of real-time analytic applications and lays the foundation for handling new AI use cases on the same datasets. We cover architecture, performance results, roadmap, and how to get started yourself.  

Achieving Data Warehouse Performance on Apache Iceberg - Kevin Chen, Product Lead @ CelerData

  • This talk dives into technical optimizations that deliver low-latency, high-concurrency queries on Apache Iceberg without sacrificing openness. Together, we'll examine what kills performance when querying Iceberg, highlight best practices that make queries faster, and evaluate query engine optimizations for Iceberg—including handling position and equality delete tables, distributed metadata parsing, and more. You'll hear real-world stories from leading enterprises who have used these lessons to optimize Apache Iceberg performance at scale and walk away with actionable techniques for making your Iceberg lakehouse faster than ever.

Managing Apache Iceberg Tables with Amazon S3: High Performance and Interoperability at Scale - Prachi Gupta, Sr. Data & AI Solution Architect @ AWS

  • ​Amazon S3 Tables delivers a fully managed storage solution for Apache Iceberg data lakes, offering performance and seamless interoperability across analytics platforms. We'll dive into how S3 Tables provides 10x faster transactions compared to standard S3 buckets while maintaining vendor-agnostic compatibility through the Iceberg REST Catalog (IRC). The session covers key features including automated maintenance, intelligent compaction strategies, and flexible integration options that enable both AWS native services and third-party applications to work seamlessly with the same datasets. Learn how organizations can leverage S3 Tables to build high-performance data lakes without sacrificing interoperability or getting locked into proprietary formats. Real-world examples will demonstrate how enterprises are using S3 Tables to manage massive datasets while maintaining open standards compliance and cross-platform accessibility.

Location
Amazon Nitro North
2250 7th Ave, Seattle, WA 98121, USA
153 Going