

Real Time Data Lakes and AI
The Open Source Analytics Community is excited to bring an evening of insights and networking to Atlanta, featuring experts from Altinity, RisingWave, LanceDB, and AWS.
Real-time databases are converging with data lakes to lower storage costs, improve performance, and make data more accessible for AI and data science. Our speakers will share challenges, solutions, and real-world experiences as organizations move from closed storage systems to open formats like Apache Iceberg.
Join us in Atlanta to learn how to build high-performance real-time data lake systems. Talks will be followed by networking, with food and drinks provided.
Big thanks to AWS for hosting this event!
Speakers
Josh Lee, Senior Developer Advocate @ Altinity
Rayees Pasha, CPO @ RisingWave
Lu Qiu, Database Engineer @ LanceDB
John Malloy @ AWS
Description of the Talks
Adapting ClickHouse® to Use Apache Iceberg Storage - Josh Lee, Developer Advocate @ Altinity
Covers Altinity's Project Antalya, which is adapting open source ClickHouse to introduce separation of compute and storage using Iceberg tables as. Architecture, performance results, and roadmap are included.
Streaming-first approach to Iceberg with RisingWave - Rayees Pasha, CPO @ RisingWave
This session will provide an overview on the technical challenges of providing a new Iceberg Table engine purpose-built for streaming workloads. The talk will highlight how our team has built end-to-end key capabilities for Iceberg table management, including Iceberg's merge-on-read query, Serverless Compaction and Iceberg table sharing to allow direct queries from other engines. A key feature in this project is the native Iceberg compaction service written in Rust using Apache DataFusion and Apache Iceberg-Rust as foundational components.
Multimodal AI Lakehouse with Lance & LanceDB - Lu Qiu, Database Engineer @ LanceDB
The next wave of AI applications demands seamless, scalable access to text, images, embeddings, and other complex modalities—but current lakehouse solutions still force teams into closed systems for vector search, full-text search, or feature engineering, reintroducing data silos. In this talk, we introduce Lance, a next-generation columnar data format optimized for AI, and LanceDB, the multimodal lakehouse built on top of it. Together, they provide low-latency access, unified vector, full-text, and SQL search, and flexible schema evolution across the entire multimodal AI lifecycle—from application serving to feature engineering and large-scale training, empowering innovators like Midjourney, WorldLabs, and Runway to build open, performant, and production-grade multimodal systems at scale.
John's talk will be announced soon!
Speaker Bios
Josh Lee - Whether it’s operators or observability, agile or accessibility, Josh’s expertise shines because he is passionate about all of it. He’s been building software for over a decade and loves sharing experiences via public speaking. He is a Developer Advocate for Altinity where he helps create educational content about ClickHouse and OpenTelemetry, and he is a contributor to the OpenTelemetry project. Connect with Josh on LinkedIn.
Rayees Pasha - Rayees has a strong background in diversified data access management technologies. He holds Master’s degrees in Computer Science from the University of Memphis and the University of Arizona. Connect with Rayees on LinkedIn.
Lu Qui - Lu is a Database engineer at LanceDB. Lu builds distributed vector databases at LanceDB and integrates Lance with the big data ecosystem. She developed the distributed system Alluxio as its PMC maintainer. She's also a Data on Kubernetes Ambassador and Kubernetes community evangelist, bridging AI data infrastructure with cloud-native technologies. Connect with Lu on LinkedIn.