Real Time Data Lakes and AI
The Open Source Analytics Community is excited to bring an evening of insights and networking to Seattle, featuring experts from Altinity, CelerData, and AWS.
Real-time databases are converging with data lakes to lower storage costs, improve performance, and make data more accessible for AI and data science. Our speakers will share challenges, solutions, and real-world experiences as organizations move from closed storage systems to open formats like Apache Iceberg.
Join us in Seattle to learn how to build high-performance real-time data lake systems. Talks will be followed by networking, with food and drinks provided.
Big thanks to AWS for hosting this event!
Speakers
Description of the Talks
Building a Foundation for AI with ClickHouse® and Apache Iceberg Storage - Robert Hodges, CEO @ Altinity.
AI applications need data. Lots of it. Altinity's Project Antalya is adapting open source ClickHouse® to introduce separation of compute and storage on shared Iceberg table data. The result: fast, cheap, flexible query that extends the life of real-time analytic applications and lays the foundation for handling new AI use cases on the same datasets. We cover architecture, performance results, roadmap, and how to get started yourself.
Achieving Data Warehouse Performance on Apache Iceberg - Kevin Chen, Product Lead @ CelerData
This talk dives into technical optimizations that deliver low-latency, high-concurrency queries on Apache Iceberg without sacrificing openness. Together, we'll examine what kills performance when querying Iceberg, highlight best practices that make queries faster, and evaluate query engine optimizations for Iceberg—including handling position and equality delete tables, distributed metadata parsing, and more. You'll hear real-world stories from leading enterprises who have used these lessons to optimize Apache Iceberg performance at scale and walk away with actionable techniques for making your Iceberg lakehouse faster than ever.
Managing Apache Iceberg Tables with Amazon S3: High Performance and Interoperability at Scale - Prachi Gupta, Sr. Data & AI Solution Architect @ AWS
Amazon S3 Tables delivers a fully managed storage solution for Apache Iceberg data lakes, offering performance and seamless interoperability across analytics platforms. We'll dive into how S3 Tables provides 10x faster transactions compared to standard S3 buckets while maintaining vendor-agnostic compatibility through the Iceberg REST Catalog (IRC). The session covers key features including automated maintenance, intelligent compaction strategies, and flexible integration options that enable both AWS native services and third-party applications to work seamlessly with the same datasets. Learn how organizations can leverage S3 Tables to build high-performance data lakes without sacrificing interoperability or getting locked into proprietary formats. Real-world examples will demonstrate how enterprises are using S3 Tables to manage massive datasets while maintaining open standards compliance and cross-platform accessibility.
