Cover Image for AI Lakehouse Meetup - Bay Area
Cover Image for AI Lakehouse Meetup - Bay Area
57 Going

AI Lakehouse Meetup - Bay Area

Hosted by Dipankar Mazumdar & 3 others
Register to See Address
San Jose, CA
Registration
3 Spots Remaining
Hurry up and register before the event fills up!
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

Join us at the Cloudera San Jose Office for an evening of talks and networking!

We are bringing the Data+AI Community to talk about open table formats such as Apache Iceberg and Lance, along with emerging AI infrastructure topics & the primitives powering agentic systems.

Agenda

  • 5:00 – 5:30 PM: Registration & Networking

  • 5:30 – 7:30 PM: Four technical talks

  • 7:30 – 8:00 PM: Networking & Snacks

Talk 1: Building the Multimodal Lakehouse for AI with LanceDB

The next wave of AI applications demands seamless, scalable access to text, images, embeddings, and other complex modalities—but current lakehouse solutions still force teams into closed systems for vector search, full-text search, or feature engineering, reintroducing data silos. In this talk, we introduce Lance, a next-generation columnar data format optimized for AI, and LanceDB, the multimodal lakehouse built on top of it. Together, they provide low-latency access, unified vector, full-text, and SQL search, and flexible schema evolution across the entire multimodal AI lifecycle.

From application serving to feature engineering and large-scale training, Lance and LanceDB empower innovators like Netflix, Runway, and WorldLabs to build open, performant, and production-grade multimodal systems at scale.

Speakers: ChanChan Mao (DevRel @ LanceDB) & Lu Qiu (Database Engineer @ LanceDB)

Talk 2: Putting Agents in your Data Platforms - Are we Ready? (with Apache Iceberg & Cloudera AI)

Data platforms traditionally have used deterministic pipelines for predictable query patterns, but Agentic AI introduces a different execution model where agents dynamically explore data systems by probing schemas, issuing iterative queries, validating hypotheses, and refining their approach based on intermediate results. This creates a new class of workload - agentic workflows over enterprise data systems.

This session will go over:

  • The architectural primitives required to manage these complex, unpredictable agentic workloads over enterprise data systems

  • The core building blocks for agentic workflows - isolation mechanisms, context, governance & auditability

  • How Iceberg features, including snapshot-based storage and branching semantics work in favor of these new workloads

  • How Cloudera's data/AI platform, built on the open foundation of Apache Iceberg, supports building Agentic workflows

Speaker: Dipankar Mazumdar (Director-Developers @Cloudera)

Talk 3: Agent Context at Scale: Graph + SQL on Apache Iceberg

Agentic systems place new demands on data infrastructure: scalability, performance, and guardrails to keep agents grounded in accurate context. At the same time, they push natural language interfaces beyond text-to-SQL, freeing retrieval to use the right tool for the right job.

In this talk, we introduce a pluggable text-to-insight framework built on Apache Iceberg that runs both SQL and Cypher over the same underlying data, giving agents richer context for better reasoning, without duplication or new silos. We'll end with a proof-of-concept demo to show it in action.

Speaker: Jaz Ku (Solution Architect @ PuppyGraph)

Talk 4: Architecting the AI-Native, Cross-Cloud Lakehouse

Adopting open table formats like Apache Iceberg has historically meant navigating a trade-off between true open interoperability and the operational ease of a fully managed platform.

In this session, we’ll explore how to architect a borderless, cross-cloud data foundation that delivers openness without compromise and is purpose-built for the agentic era. We will dive into how Google Cloud’s Lakehouse architecture leverages the open Iceberg REST Catalog to provide a unified metadata layer across any compatible engine, allowing you to seamlessly query a single copy of data using BigQuery, Managed Spark, or Trino. 

We’ll discuss how this setup unlocks advanced capabilities directly on open formats, including high-throughput streaming and multi-statement transactions. Finally, we’ll demonstrate how to pair this open foundation with GCP's Knowledge Catalog for advanced discoverability and governance such as effectively transforming passive Iceberg metadata into an active semantic knowledge engine for AI agents. 

Join us for an architectural deep dive and a live demo on bridging open standards with fully managed AI capabilities

Speaker: Vinod Ramachandran (Google)

Note: Due to limited venue capacity, registrations are subject to approval. Approved attendees will receive a confirmation email from Luma. If the event reaches capacity, additional registrations will be placed on the waitlist.

** LOCATION-SPECIFIC CHECK-IN INSTRUCTIONS WILL BE SHARED SOON **

Location
Please register to see the exact location of this event.
San Jose, CA
57 Going