Real Time Data Lakes and AI

Name: Real Time Data Lakes and AI
Start: 2025-11-20T17:30:00.000-08:00
End: 2025-11-20T21:00:00.000-08:00
Location: Sentry

Hosted by Open Source Analytics Community & 8 others

Sentry

San Francisco, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

The OSA Community is proud to present an evening of insights and networking in San Francisco, featuring experts from Altinity, Dremio, RisingWave, LanceDB, and AWS.

Real-time databases are integrating with data lakes to reduce storage costs and share data with AI and data science. Please join us to hear from a range of experts as they share current problems and solutions while navigating the transition from closed storage models to open table formats like Apache Iceberg.

Join us in San Francisco for an evening with experts from Altinity, Dremio, Rising Wave, LanceDB, and AWS. Learn how to build high-performance real-time data lake systems. Networking to follow presentations. Food and drink provided!

Many thanks to Sentry for generously providing the venue and catering.

Speakers

Robert Hodges, CEO @ Altinity
Alex Merced, Head of DevRel @ Dremio
Yingjun Wu, Founder and CEO @ RisingWave Labs
Lei Xu, Cofounder & CTO @ LanceDB
Prachi Gupta @ AWS

Description of the Talks

Introducing Hybrid Tables from Project Antalya: Combine MergeTree and Iceberg Data in a Single ClickHouse® Table - Robert Hodges, CEO @ Altinity.

Open source ClickHouse® stores data on replicated MergeTree tables. It's fast but expensive and hard to manage as tables become large. Our goal at Project Antalya is to let users extend MergeTree tables transparently onto cheap, shared Apache Iceberg storage. This talk introduces the Hybrid Table Engine, an new feature of the Antalya 25.8 build. Hybrid tables allow users to create a single table that points to hot data on MergeTree and cold data on Apache Iceberg. We'll show you how hybrid tables work, how to set them up yourself, and our roadmap for making them better.

From Data Chaos to Autonomous Optimization: Iceberg Lakehouses for AI - Alex Merced, Head of DevRel @ Dremio.

Agentic AI requires speed and reliability, yet too often enterprises face query delays, manual tuning, and fragile architectures. Apache Iceberg brings order to the data lake, but managing catalogs, compaction, and performance at scale can be overwhelming. This talk demonstrates how Dremio eliminates these roadblocks by pairing Iceberg’s open table format with Polaris-powered metadata, autonomous Reflections, and Iceberg-native optimizations. Attendees will see how to transform raw, siloed data into an intelligent lakehouse ready for AI-driven decision-making.

Are You Sure Your Lakehouse Is Interoperable? - Yingjun Wu, Founder and CEO @ RisingWave Labs

Interoperability is one of the core promises of the data lakehouse. Different engines should be able to read and write the same tables without friction. But deletes in Apache Iceberg show how fragile that promise can be. Position deletes are straightforward when row locations are stable, yet they do not work well with streaming CDC pipelines where data is always in motion. Equality deletes close that gap by filtering rows by key, but they bring their own problems. Delete files can pile up, queries can slow down, and engines often handle them inconsistently. In this talk, we will examine how deletes complicate interoperability in Iceberg, explore hybrid strategies that combine position and equality deletes, and explain why compaction is essential for keeping queries efficient across engines. We will also share lessons from running CDC pipelines into Iceberg in production and what it really takes to preserve interoperability in practice.

LanceDB, a Multimodal Lakehouse for AI - Lei Xu, Co-founder and CTO @LanceDB

The next wave of AI applications demands seamless, scalable access to text, images, embeddings, and other complex modalities—but current lakehouse solutions still force teams into closed systems for vector search, full-text search, or feature engineering, reintroducing data silos. In this talk, we introduce Lance, a next-generation columnar data format optimized for AI, and LanceDB, the multimodal lakehouse built on top of it. Together, they provide low-latency access, unified vector, full-text, and SQL search, and flexible schema evolution across the entire multimodal AI lifecycle—from application serving to feature engineering and large-scale training, empowering innovators like Midjourney, Runway, and Netflix to build open, performant, and production-grade multimodal systems at scale.

Managing Apache Iceberg Tables with Amazon S3: High Performance and Interoperability at Scale - Prachi Gupta, Sr. Data & AI Solution Architect @ AWS

Amazon S3 Tables delivers a fully managed storage solution for Apache Iceberg data lakes, offering performance and seamless interoperability across analytics platforms. We'll dive into how S3 Tables provides 10x faster transactions compared to standard S3 buckets while maintaining vendor-agnostic compatibility through the Iceberg REST Catalog (IRC). The session covers key features including automated maintenance, intelligent compaction strategies, and flexible integration options that enable both AWS native services and third-party applications to work seamlessly with the same datasets. Learn how organizations can leverage S3 Tables to build high-performance data lakes without sacrificing interoperability or getting locked into proprietary formats. Real-world examples will demonstrate how enterprises are using S3 Tables to manage massive datasets while maintaining open standards compliance and cross-platform accessibility.

Biography of Speakers

Robert Hodges - Robert is the CEO of Altinity, an enterprise provider for ClickHouse data warehouse. He's also a database geek with experience on at least 20 DBMS types. Robert caught the Kubernetes bug at VMware in 2018.

Connect with Robert on LinkedIn.

Alex Merced - Alex is Head of DevRel for Dremio and co-author of "Apache Iceberg: The definitive guide" from O'reilly and has worked as a developer and instructor for companies like GenEd Systems, Crossfield Digital, CampusGuard and General Assembly.
Alex is passionate about technology and has put out tech content on outlets such as blogs, videos and his podcasts Datanation and Web Dev 101. Alex Merced has contributed a variety of libraries in the Javascript & Python worlds including SencilloDB, CoquitoJS, dremio-simple-query and more.

Connect with Alex on LinkedIn.

Yingjun Wu - Yingjun is the founder of RisingWave Labs (https://www.risingwave.com/), a database company developing RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. Yingjun received his PhD degree from National University of Singapore, and was a visiting PhD at Carnegie Mellon University. He has been working in the field of stream processing and database systems for over a decade.

Connect with Yingjun on LinkedIn.

Prachi Gupta - Prachi is a Senior Data & AI Solution Architect at AWS, specializing in designing and implementing large-scale data infrastructure and data migration solutions. She has extensive experience in building robust data management systems, with a focus on integrating storage with analytical and AI platforms. Prachi helps customers modernize their data platforms on AWS, leveraging the latest services and technologies with her current focus on Amazon S3 and the S3 Table feature, which enables efficient management and querying of structured data at scale. She has worked on creating solution for migration of iceberg &hive data to S3 Tables, multiple workshops focusing on S3 Tables use cases along with testing the product and its performance.

Connect with Prachi on LinkedIn.

Location

Sentry

45 Fremont St, San Francisco, CA 94105, USA

Hosted By

153 Went

Tech

Real Time Data Lakes and AI

​​​Speakers

​​Description of the Talks

​Biography of Speakers

Speakers

Description of the Talks

Biography of Speakers