Cover Image for South Bay Systems: Caching & Clustering
Cover Image for South Bay Systems: Caching & Clustering
Avatar for South Bay Systems
Presented by
South Bay Systems
Systems meetup in the South Bay Area
121 Going

South Bay Systems: Caching & Clustering

Registration
Event Full
If you’d like, you can join the waitlist.
Please click on the button below to join the waitlist. You will be notified if additional spots become available.
About Event

Welcome to another edition of South Bay Systems! This time we bring you two wonderful talks: Yao Yue will speak about her experiences working on Caching systems, and Eric Liang will talk about automatic data clustering.

Agenda

  • 6:00 PM: Doors open, food and socializing

  • 6:30 PM — 6:35 PM: Introduction

  • 6:35 PM — 7:05 PM: Caching Talk

  • 7:05 PM — 7:35 PM: Clustering Talk

  • 7:35 PM onward : Community socializing!

Food and beverages will be provided, courtesy of our host, YugabyteDB.


​​Datacenter caching: A round trip to the ephemeral store

As far as systems research goes, the solution is only as good as one’s understanding of the problem. By gathering and releasing real production traces, a small group of researchers and practitioners challenged the long-held belief about what caching strategies work best, and incorporated practical design considerations such as implementation complexity and production fitness. Not only did the resulting algorithms become widely adopted almost overnight, the data continue to fuel caching research in scenarios that extend far beyond the original environment where these insights were first extracted.

This talk not only aims to provide a high-level roadmap of how this line of work came about, but also tries to propose a successful formula to stimulate academia-industry collaboration and co-evolution. The hope is that many more round trips can be completed between labs and production, even in domains that may not have anything to do with datacenter caching.

​Speaker Bio

Yao is fascinated by everything about systems, especially how different parts interact in the real world. She is currently the co-founder and CEO of IOP Systems, a startup that focuses on improving inference performance and cost-effectiveness. In her past life, she led the Cache team at Twitter, open-sourced Pelikan Cache, and later founded the performance engineering team.


AutoLiquid: Autonomic Data Layout Optimization for the Databricks Lakehouse

Data layout recommendation is a classic problem in database literature, yet the downstream challenge of autonomously applying these changes in live production systems is less frequently studied. In Lakehouse architectures, selecting the right clustering key is critical for data skipping performance, but remains a manual process that cannot scale to the hundreds of millions of tables managed by platforms like Databricks. We present AutoLiquid, an autonomic system that solves this by continuously selecting, verifying, and applying clustering keys for Liquid Clustering tables, requiring only a single "CLUSTER BY AUTO" declaration from the user. AutoLiquid is currently deployed in production at Databricks, managing clustering keys for millions of tables.

​​​Speaker Bio

Eric Liang leads the ML for Systems team at Databricks. His past work includes scaling distributed machine learning and reinforcement learning at Anyscale, where he served as the TL for the Ray project. He holds a PhD in Computer Science from UC Berkeley and was also part of the Databricks team earlier in his career.

Location
YugabyteDB, Inc.
100 Mathilda Pl ste 250, Sunnyvale, CA 94086, USA
There is a large underground parking lot accessible from 150 S Taaffe St, Sunnyvale, CA 94086. Also note that it's *very* walkable from caltrain!
Avatar for South Bay Systems
Presented by
South Bay Systems
Systems meetup in the South Bay Area
121 Going