Cover Image for Lakehouse, AI and Iceberg Meetup: San Francisco
Cover Image for Lakehouse, AI and Iceberg Meetup: San Francisco
228 Went
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Lakehouse, AI, and Iceberg Meetup: San Francisco

Join the leading voices in modern data architecture for an evening of insight and networking.

As the data landscape shifts toward open formats and real-time intelligence, staying ahead of the curve is essential. Join Olake, RisingWave, Datastrato, Ryft and Dremio at the heart of North Beach for a deep dive into the intersection of Data Lakehouses, Artificial Intelligence, and Apache Iceberg.

Whether you are a data engineer, architect, or AI practitioner, this meetup is designed to bridge the gap between storage, processing, and actionable intelligence.

📍 Location & Time

  • Date: Wednesday, February 11th

  • Venue: Dream Event SF, 1524 Powell Street, San Francisco, CA

  • Doors Open: 6:00 PM

  • Event Ends: 9:00 PM

🛠 Featured Organizers

Learn from the experts behind some of the most innovative tools in the ecosystem:

  • Olake: Revolutionizing how we sync and manage data.

  • RisingWave: The streaming database for real-time applications.

  • Datastrato: Empowering data platform teams with Gravitino.

  • Ryft: Intelligent Iceberg Management Platform

  • Dremio: The Agentic lakehouse platform

  • VeloDB:  real-time analytics and search database for diverse workloads

💡 Why Attend?

  • Technical Deep Dives: Hear how Apache Iceberg is becoming the standard for open table formats.

  • AI Integration: Discuss how lakehouse architectures are evolving to support LLMs and generative AI workloads.

  • Community Networking: Connect with the San Francisco data community over drinks and shared insights.

Talks

Alex Merced - Head of DevRel, Dremio

Data Lakehouses, Federation and Data Virtualization
Projects move fast, and they slow down when data sits in silos. Teams need quick access to current data, and they need a way to read it without complex pipelines. A lakehouse gives you one place to store and manage data with strong governance. Query federation lets you reach data that still lives in other systems. Data virtualization presents all of it as one logical layer.

​This talk explains how these three ideas work together to create a unified view of your data. You learn why this matters for model training, feature work, and agent workflows. You also see how a unified layer cuts friction, shortens planning cycles, and reduces the cost of moving data. The session offers clear guidance on when to store data in the lakehouse, when to federate, and how to use virtualization to keep access simple and fast.

----

Rohan Khameshra - Co-founder, Datazip

Title: Why Iceberg-Native Ingestion Matters (and What Breaks If You Ignore It)

Abstract: Apache Iceberg is great at scaling data, but it doesn’t fix problems caused by bad ingestion. If you get ingestion wrong, you end up with small files, bloated metadata, and correctness issues that are painful to clean up later. In this talk, I’ll share what we learned building OLake, an open-source, Iceberg-native ingestion engine, including mistakes we made and how we fixed them. We’ll look at how to ingest data fast without breaking tables, how to handle schema changes and exactly-once writes safely. I’ll also briefly compare OLake with tools like Debezium, Flink, and other managed ELT solutions, and explain why building ingestion specifically for Iceberg is very different from just adding Iceberg as a sink.

Speaker Bio: Rohan Khameshra is the co-founder of Datazip and the creator of OLake, an open-source, Iceberg-native ingestion engine. He works closely with data engineering teams adopting Apache Iceberg for analytics and AI workloads, focusing on high-throughput CDC, table health, and multi-engine interoperability. His work centers on designing ingestion systems that treat Iceberg as a first-class storage layer rather than a downstream sink.

----

Andrew Dong - President, Datastrato

Making Iceberg REST Production-Ready: Security, Identity, and Governance

Apache Iceberg and the Iceberg REST Catalog simplify the decoupling of compute from storage, but they leave a critical gap: security and governance. As teams transition from experimentation to production, this gap becomes a major blocker for scaling multi-tenant lakehouses.

This talk shows how Iceberg REST can be made production-ready through identity-aware governance. Using Apache Gravitino as a reference architecture, we will demonstrate authentication, fine-grained access control, and secure credential vending for cloud storage. Attendees will also see how Spark, Trino, and Flink connect to a secured Iceberg REST endpoint without custom engine forks, enabling secure and vendor-neutral lakehouse platforms.

----

Yuval Yogev, CTO @ Ryft

Title of Talk: Implementing Intelligent Snapshot Management

Abstract:
Apache Iceberg snapshots enable time travel and rollback, but they are not free - What do you do when you can only afford to keep a few thousand of them?

With streaming ingestion, frequent commits, and compaction, tables can accumulate thousands of snapshots per day. Retention quickly becomes expensive without actually preserving useful restore points.

This session dives into how we implemented intelligent snapshot management. We present time-aware retention models that preserve what matters: high-resolution snapshots for recent history, and calendar-aligned restore points for long-term recovery. Instead of treating snapshots as temporary logs or hoarding them indefinitely, we apply backup patterns from databases and filesystems - leveraging Iceberg’s native snapshot and tagging semantics to make retention predictable, and operationally sustainable.

Speaker Bio:
I’m passionate about building high-throughput distributed systems and making complex data platforms simple, resilient, and scalable. Today, I’m the Co-Founder and CTO of Ryft focused on next-generation data infrastructure. Before that, I spent several years as Chief Architect at Sygnia, helping companies strengthen their cyber resilience through scalable platforms and fast data pipelines.

----

Rayees Pasha - CPO, RisingWave

Engineered for Performance: DataFusion-Powered Iceberg Analytics in RisingWave


Abstract: This talk explores how Apache DataFusion can be used to boost query performance on Apache Iceberg tables within RisingWave by leveraging DataFusion’s efficient execution engine to accelerate Iceberg query and compaction workloads embedded in RisingWave’s real-time platform. We’ll cover integration strategies, performance characteristics, and practical engineering considerations for optimizing Iceberg analytics in a unified streaming and batch environment.

Speaker Bio: Rayees is CPO at RisingWave Labs, responsible for all areas of product management and marketing. His expertise is in the areas of data management and big data analytics. He has held product management roles delivering enterprise software in both traditional and SaaS environments. Prior to moving to product management, he worked at Hewlett-Packard as a software designer working on different aspects of database management systems.

----

Kevin Shen - Principal Product Manager, VeloDB

Topic: From Iceberg to AI: Accelerating Lakehouse Analytics and Hybrid Retrieval with Apache Doris

Abstract:

The Lakehouse architecture, powered by open table formats such as Apache Iceberg, has become the foundation for modern analytics and AI workloads. However, efficiently serving diverse query patterns from large-scale analytical SQL to text and vector retrieval in a single system remains a key challenge.

In this talk, we introduce how Apache Doris enhances the Iceberg-based Lakehouse with high-performance query acceleration and materialized views, enabling faster and more cost-efficient analytics directly on open data. 

We will also explore new Apache Doris features for AI: Unifying analytics across structured, semi-structured, and unstructured data the combination of search and analytics. Exploring how Apache Doris combines  analytical queries, full-text search, and vector search in a single system, simplifying architecture for AI knowledge stores. 

Lastly we will review the Apache Doris Roadmap sharing our plans for deeper integration with open table formats, including plans for managed Iceberg tables and bringing native inverted and vector indexing capabilities directly to Iceberg data, enabling zero-copy, high-performance analytical and retrieval workloads.

Location
1524 Powell St
San Francisco, CA 94133, USA
228 Went