

Real Time Data Lakes and AI
The OSA Community is proud to present an evening of insights and networking in San Francisco, featuring experts from Altinity, Dremio, RisingWave, LanceDB, and AWS.
Real-time databases are integrating with data lakes to reduce storage costs and share data with AI and data science. Please join us to hear from a range of experts as they share current problems and solutions while navigating the transition from closed storage models to open table formats like Apache Iceberg.
Join us in San Francisco for an evening with experts from Altinity, Dremio, Rising Wave, LanceDB, and AWS. Learn how to build high-performance real-time data lake systems. Networking to follow presentations. Food and drink provided!
Many thanks to Sentry for generously providing the venue and catering.
Speakers
Robert Hodges, CEO @ Altinity
Alex Merced, Head of DevRel @ Dremio
Yingjun Wu, Founder and CEO @ RisingWave Labs
Lei Xu, Cofounder & CTO @ LanceDB
John Malloy @ AWS
Description of the Talks
Introducing Hybrid Tables from Project Antalya: Combine MergeTree and Iceberg Data in a Single ClickHouse® Table - Robert Hodges, CEO @ Altinity.
Open source ClickHouse® stores data on replicated MergeTree tables. It's fast but expensive and hard to manage as tables become large. Our goal at Project Antalya is to let users extend MergeTree tables transparently onto cheap, shared Apache Iceberg storage. This talk introduces the Hybrid Table Engine, an new feature of the Antalya 25.8 build. Hybrid tables allow users to create a single table that points to hot data on MergeTree and cold data on Apache Iceberg. We'll show you how hybrid tables work, how to set them up yourself, and our roadmap for making them better.
From Data Chaos to Autonomous Optimization: Iceberg Lakehouses for AI - Alex Merced, Head of DevRel @ Dremio.
Agentic AI requires speed and reliability, yet too often enterprises face query delays, manual tuning, and fragile architectures. Apache Iceberg brings order to the data lake, but managing catalogs, compaction, and performance at scale can be overwhelming. This talk demonstrates how Dremio eliminates these roadblocks by pairing Iceberg’s open table format with Polaris-powered metadata, autonomous Reflections, and Iceberg-native optimizations. Attendees will see how to transform raw, siloed data into an intelligent lakehouse ready for AI-driven decision-making.
Are You Sure Your Lakehouse Is Interoperable? - Yingjun Wu, Founder and CEO @ RisingWave Labs
Interoperability is one of the core promises of the data lakehouse. Different engines should be able to read and write the same tables without friction. But deletes in Apache Iceberg show how fragile that promise can be. Position deletes are straightforward when row locations are stable, yet they do not work well with streaming CDC pipelines where data is always in motion. Equality deletes close that gap by filtering rows by key, but they bring their own problems. Delete files can pile up, queries can slow down, and engines often handle them inconsistently. In this talk, we will examine how deletes complicate interoperability in Iceberg, explore hybrid strategies that combine position and equality deletes, and explain why compaction is essential for keeping queries efficient across engines. We will also share lessons from running CDC pipelines into Iceberg in production and what it really takes to preserve interoperability in practice.
LanceDB, a Multimodal Lakehouse for AI - Lei Xu, Co-founder and CTO @LanceDB
The next wave of AI applications demands seamless, scalable access to text, images, embeddings, and other complex modalities—but current lakehouse solutions still force teams into closed systems for vector search, full-text search, or feature engineering, reintroducing data silos. In this talk, we introduce Lance, a next-generation columnar data format optimized for AI, and LanceDB, the multimodal lakehouse built on top of it. Together, they provide low-latency access, unified vector, full-text, and SQL search, and flexible schema evolution across the entire multimodal AI lifecycle—from application serving to feature engineering and large-scale training, empowering innovators like Midjourney, Runway, and Netflix to build open, performant, and production-grade multimodal systems at scale.
Biography of Speakers
Robert Hodges - Robert is the CEO of Altinity, an enterprise provider for ClickHouse data warehouse. He's also a database geek with experience on at least 20 DBMS types. Robert caught the Kubernetes bug at VMware in 2018.
Connect with Robert on LinkedIn.
Alex Merced - Alex is Head of DevRel for Dremio and co-author of "Apache Iceberg: The definitive guide" from O'reilly and has worked as a developer and instructor for companies like GenEd Systems, Crossfield Digital, CampusGuard and General Assembly.
Alex is passionate about technology and has put out tech content on outlets such as blogs, videos and his podcasts Datanation and Web Dev 101. Alex Merced has contributed a variety of libraries in the Javascript & Python worlds including SencilloDB, CoquitoJS, dremio-simple-query and more.
Connect with Alex on LinkedIn.
Yingjun Wu - Yingjun is the founder of RisingWave Labs (https://www.risingwave.com/), a database company developing RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. Yingjun received his PhD degree from National University of Singapore, and was a visiting PhD at Carnegie Mellon University. He has been working in the field of stream processing and database systems for over a decade.
Connect with Yingjun on LinkedIn.