South Bay Systems: Caching & Clustering

Name: South Bay Systems: Caching & Clustering
Start: 2026-06-26T18:00:00.000-07:00
End: 2026-06-26T20:00:00.000-07:00
Location: YugabyteDB, Inc.

South Bay Systems

YugabyteDB, Inc.

Sunnyvale, CA

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Welcome to another edition of South Bay Systems! This time we bring you two wonderful talks: Yao Yue will speak about her experiences working on Caching systems, and Eric Liang will talk about automatic data clustering.

Agenda

6:00 PM: Doors open, food and socializing
6:30 PM — 6:35 PM: Introduction
6:35 PM — 7:05 PM: Caching Talk
7:05 PM — 7:35 PM: Clustering Talk
7:35 PM onward : Community socializing!

Food and beverages will be provided, courtesy of our host, YugabyteDB.

Datacenter caching: A round trip to the ephemeral store

As far as systems research goes, the solution is only as good as one’s understanding of the problem. By gathering and releasing real production traces, a small group of researchers and practitioners challenged the long-held belief about what caching strategies work best, and incorporated practical design considerations such as implementation complexity and production fitness. Not only did the resulting algorithms become widely adopted almost overnight, the data continue to fuel caching research in scenarios that extend far beyond the original environment where these insights were first extracted.

This talk not only aims to provide a high-level roadmap of how this line of work came about, but also tries to propose a successful formula to stimulate academia-industry collaboration and co-evolution. The hope is that many more round trips can be completed between labs and production, even in domains that may not have anything to do with datacenter caching.

Speaker Bio

Yao is fascinated by everything about systems, especially how different parts interact in the real world. She is currently the co-founder and CEO of IOP Systems, a startup that focuses on improving inference performance and cost-effectiveness. In her past life, she led the Cache team at Twitter, open-sourced Pelikan Cache, and later founded the performance engineering team.

AutoLiquid: Autonomic Data Layout Optimization for the Databricks Lakehouse

Data layout recommendation is a classic problem in database literature, yet the downstream challenge of autonomously applying these changes in live production systems is less frequently studied. In Lakehouse architectures, selecting the right clustering key is critical for data skipping performance, but remains a manual process that cannot scale to the hundreds of millions of tables managed by platforms like Databricks. We present AutoLiquid, an autonomic system that solves this by continuously selecting, verifying, and applying clustering keys for Liquid Clustering tables, requiring only a single "CLUSTER BY AUTO" declaration from the user. AutoLiquid is currently deployed in production at Databricks, managing clustering keys for millions of tables.

Speaker Bio

Eric Liang leads the ML for Systems team at Databricks. His past work includes scaling distributed machine learning and reinforcement learning at Anyscale, where he served as the TL for the Ray project. He holds a PhD in Computer Science from UC Berkeley and was also part of the Databricks team earlier in his career.

Location

YugabyteDB, Inc.

100 Mathilda Pl ste 250, Sunnyvale, CA 94086, USA

Enter the public garage via Evelyn St. Free Parking is available from this side. When coming into the building via the lobby or garage, take the far-left elevator from the Garage Elevator Garage bank up to the 2nd floor. Also note that it's *very* walkable from caltrain!

Presented by

South Bay Systems

Systems meetup in the South Bay Area

Hosted By

115 Went

Tech

South Bay Systems: Caching & Clustering

​​​Datacenter caching: A round trip to the ephemeral store

​​Speaker Bio

​AutoLiquid: Autonomic Data Layout Optimization for the Databricks Lakehouse

​​​​Speaker Bio

Datacenter caching: A round trip to the ephemeral store

Speaker Bio

AutoLiquid: Autonomic Data Layout Optimization for the Databricks Lakehouse

Speaker Bio