Open Lakehouse and AI
The OSA Community is proud to host the Real Time Data Lakes and AI event in San Francisco!
As real-time databases integrate more closely with data lakes to reduce storage costs and unlock data for AI and advanced analytics, data infrastructure is evolving fast. Join us to hear from leading experts as they share practical solutions and lessons learned in building open, scalable, and high-performance data platforms.
Food and drinks provided!
Shoutout to PostHog for sponsoring the venue!
Speakers
Robert Hodges, CEO @ Altinity
Kaisen Kang, Head of Query & Agent Team @ CelerData
Éamon Ryan, Senior Principal Field Engineer @ Grafana
James Greenhill, Data Peddler @ PostHog
Agenda
6 pm - Networking
6:15 - 8:00 pm - Talks
8:00 - 9:00 pm - Networking
Description of the Talks
Building a Foundation for AI with ClickHouse® and Apache Iceberg Storage
Speaker: Robert Hodges, CEO @ Altinity
Abstract: AI applications need data. Lots of it. Altinity's Project Antalya is adapting open source ClickHouse® to introduce separation of compute and storage on shared Iceberg table data. The result: fast, cheap, flexible query that extends the life of real-time analytic applications and lays the foundation for handling new AI use cases on the same datasets. We cover architecture, performance results, roadmap, and how to get started yourself.
What AI Data Agents Need from an Analytics Engine
Speaker: Kaisen Kang, Head of Query & Agent Team @ CelerData
Abstract: AI data agents rely on iterative, agent-generated SQL to answer questions, explore data, and refine results across multiple turns. In production, this places strict demands on the analytics engine: low-latency execution to maintain conversational flow, high concurrency to support many users and agents, efficient joins and aggregations for real analytical workloads, and strong controls to prevent runaway cost or unsafe queries.
This talk outlines 10 core engine capabilities required to support AI data agents in practice, using StarRocks as an example. We’ll examine how modern analytical engines handle agent-driven query patterns, frequent re-computation, real-time and semi-structured data, and governance at scale—and what to look for when evaluating an engine for AI-powered analytics.
Visualizing Your Data Lake with Grafana
Speaker: Éamon Ryan, Senior Principal Field Engineer @ Grafana
Abstract: In this brief talk, we’ll walk through how to get started with Grafana’s open source platform to explore and understand your data lake. We’ll cover how to connect to your data—no matter where it lives—then craft queries that turn raw information into clear, compelling visualizations, and finally set up alerts and annotations so you’re always in the know when something important changes in your data lake.
Getting your ducks in a row
Speaker: James Greenhill, Data Peddler @ PostHog
Abstract: Leveraging DuckDB's incredible execution engine to build our next generation Data Warehouse providing single tenant installations with serverless flexibility. This talk will cover how we developed a Control Plane / Data Plane model for executing DuckDB workers anywhere and with any concurrency leveraging DuckLake and what this will look like in the future. We will also cover what we think are the missing links that have kept DuckDB from being more widely adopted in the past.
