Cover Image for Real Time Data Lakes and AI
Cover Image for Real Time Data Lakes and AI
2 Going

Real Time Data Lakes and AI

Hosted by Open Source Analytics Community & 5 others
Registration
Welcome! To join the event, please register below.
About Event

​​The OSA Community is proud to present an evening of insights and networking in San Francisco, featuring experts from ClickHouse®, Dremio, RisingWave, LanceDB, and AWS.

​Real-time databases are integrating with data lakes to reduce storage costs and share data with AI and data science. Please join us to hear from a range of experts as they share current problems and solutions while navigating the transition from closed storage models to open table formats like Apache Iceberg.

​Join us in San Francisco for an evening with experts from ClickHouse, Dremio, Rising Wave, LanceDB, and AWS. Learn how to build high-performance real-time data lake systems. Networking to follow presentations. Food and drink provided!

Many thanks to Sentry for generously providing the venue and catering.


​​Speakers


Description of the Talks

​​Adapting ClickHouse® to use Apache Iceberg Storage - Robert Hodges, CEO @ Altinity.

  • ​​Covers Altinity's Project Antalya, which is adapting open source ClickHouse to introduce separation of compute and storage using Iceberg tables as. Architecture, performance results, and roadmap are included. 

​​From Data Chaos to Autonomous Optimization: Iceberg Lakehouses for AI - Alex Merced, Head of DevRel @ Dremio.

  • Agentic AI requires speed and reliability, yet too often enterprises face query delays, manual tuning, and fragile architectures. Apache Iceberg brings order to the data lake, but managing catalogs, compaction, and performance at scale can be overwhelming. This talk demonstrates how Dremio eliminates these roadblocks by pairing Iceberg’s open table format with Polaris-powered metadata, autonomous Reflections, and Iceberg-native optimizations. Attendees will see how to transform raw, siloed data into an intelligent lakehouse ready for AI-driven decision-making.

Are You Sure Your Lakehouse Is Interoperable? - ​Yingjun Wu, Founder and CEO @ RisingWave Labs

  • Interoperability is one of the core promises of the data lakehouse. Different engines should be able to read and write the same tables without friction. But deletes in Apache Iceberg show how fragile that promise can be. Position deletes are straightforward when row locations are stable, yet they do not work well with streaming CDC pipelines where data is always in motion. Equality deletes close that gap by filtering rows by key, but they bring their own problems. Delete files can pile up, queries can slow down, and engines often handle them inconsistently. In this talk, we will examine how deletes complicate interoperability in Iceberg, explore hybrid strategies that combine position and equality deletes, and explain why compaction is essential for keeping queries efficient across engines. We will also share lessons from running CDC pipelines into Iceberg in production and what it really takes to preserve interoperability in practice.


Biography of Speakers

Robert Hodges - Robert is the CEO of Altinity, an enterprise provider for ClickHouse data warehouse. He's also a database geek with experience on at least 20 DBMS types. Robert caught the Kubernetes bug at VMware in 2018.

​Connect with Robert on LinkedIn.

Alex Merced - Alex is Head of DevRel for Dremio and co-author of "Apache Iceberg: The definitive guide" from O'reilly and has worked as a developer and instructor for companies like GenEd Systems, Crossfield Digital, CampusGuard and General Assembly.
Alex is passionate about technology and has put out tech content on outlets such as blogs, videos and his podcasts Datanation and Web Dev 101. Alex Merced has contributed a variety of libraries in the Javascript & Python worlds including SencilloDB, CoquitoJS, dremio-simple-query and more.

​Connect with Alex on LinkedIn.

Yingjun Wu - Yingjun is the founder of RisingWave Labs (https://www.risingwave.com/), a database company developing RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. Yingjun received his PhD degree from National University of Singapore, and was a visiting PhD at Carnegie Mellon University. He has been working in the field of stream processing and database systems for over a decade.

​Connect with Yingjun on LinkedIn.

Location
Sentry
45 Fremont St, San Francisco, CA 94105, USA
2 Going