

Open Source Data Streaming Meetup Bay Area
Open Source Data Streaming Meetup!
Join us on March 17th (Tuesday) from 5:30-8:30 PM at the Snowflake Menlo Park Office.
Connect with fellow community members, share insights, and dive into the latest developments in the world of data streaming around Apache Kafka, Apache Flink®, and more!
Note on Parking: There is free parking available at the event.
Agenda
5:30 PM - 6:15 PM: Doors Open & Networking
6:15 PM - 8:00 PM: Welcome Remarks & Presentations!
8:00 PM - 8:30 PM: More Networking
Sessions
Deep Dive into Flink's Disaggregated State Management, Vasia Kalavri, Boston University
If you've operated Flink jobs with large state, you've probably hit some familiar pain points: long recovery times, unexpected CPU spikes from RocksDB compactions, or running out of local disk space at the worst possible time. This talk explores how Apache Flink 2.x fundamentally changes the game by separating compute from storage, enabling faster scaling and recovery. We'll dive into the internals of disaggregated state management, discuss why naively combining a remote state backend with Flink’s synchronous execution model is a bad idea, and explain how to make the runtime asynchronous, while ensuring Flink’s out-of-order execution semantics and fault-tolerance guarantees are preserved.
Under the Hood: The Evolution of Snowflake's Streaming Ingest (V1 to V2), Tyler Jones, Snowflake
Snowflake's streaming ingest has gone through a significant architectural evolution. In this talk, we'll take you inside the journey from Snowpipe Streaming V1 to its high-performance successor, V2 — how they work under the hood, what changed, and why.
We'll start with the V1 architecture, walk through how it maps to Kafka's topic/partition model via Kafka Connect, and explain how we achieve exactly-once semantics at scale. We'll share real-world lessons from operating V1 in production — particularly its impact on downstream query performance — and how those learnings drove the design of V2.
From there, we'll dig into the V2 + Kafka Connect integration: the new challenges that came with the architecture shift, including data validation at ingest time, handling schema evolution across heterogeneous topics, and the trade-offs we made along the way. If you're running Kafka pipelines that land data in an analytical store, this talk is a look at the hard problems behind making that fast, correct, and reliable.