Cover Image for South Bay Systems: Apache Pinot on Object Storage / Variants in Apache Doris
Cover Image for South Bay Systems: Apache Pinot on Object Storage / Variants in Apache Doris
Avatar for South Bay Systems
Presented by
South Bay Systems
Systems meetup in the South Bay Area
149 Going

South Bay Systems: Apache Pinot on Object Storage / Variants in Apache Doris

Registration
Welcome! To join the event, please register below.
About Event

Welcome to another edition of South Bay Systems! This time, we'll have a double feature! First we'll have Songqiao Su and Raghav Yadav talking about optimizing Apache Pinot for real-time analytics, then we'll have Owen Xiao talking about variants and semi-structured data in Apache Doris.

​​Agenda

  • ​​6:00 PM: Doors open, food and socializing

  • ​​6:30 PM — 7:00 PM: Apache Pinot Talk

  • 7:00 PM — 7:30 PM: Apache Doris Talk

  • ​​7:30 PM onward : Community socializing!

​​Food and beverages will be provided, courtesy of our hosts, Adobe.


Low-Latency Serving on Cloud Object Stores with Apache Pinot

In this talk, we present the evolution of Apache Pinot’s architecture: first from tightly coupled storage and compute, to decoupled cloud storage, and now toward native support for Parquet as a first-class segment format. We will discuss key technical innovations such as the implementation of a Parquet-compatible forward index reader, which enables all of Pinot’s indexing strategies to operate directly on Parquet files. Additional optimizations include index pinning, Parquet page-level selective reads, page prefetching for efficient I/O parallelism, and page caching. Together, these enhancements allow Pinot’s indexing and query execution framework to deliver sub-second performance directly on Parquet data, going far beyond conventional metadata-based pruning approaches.

Speaker Bio

Songqiao Su is a Staff Software Engineer at StarTree.AI, working on building tiered storage and improving compute–storage decoupling in Apache Pinot and StarTree Cloud. His work focuses on large-scale, high-performance distributed systems. Before joining StarTree, he worked on network and RPC infrastructure at Facebook and Databricks.

Raghav Yadav is a Staff Software Engineer at StarTree.AI, working on building a low-latency serving layer on Iceberg in Apache Pinot and StarTree Cloud. His expertise spans distributed databases and large-scale systems, with experience in cloud-scale data infrastructure at Microsoft Azure, real-time streaming databases as a founding engineer at Grainite, and now real-time OLAP analytics at StarTree.


The Evolution of Semi-Structured Data Analytics: From Text, JSON to VARIANT

Abstract

Semi-structured data, such as JSON, is gaining widespread adoption due to its flexibility. However, traditional databases and data warehouses are built for structured schemas, creating new challenges in storing and analyzing semi-structured formats. In this session, we’ll explore:

  • Characteristics and challenges of semi-structured data

  • Limitations of traditional approaches

  • Apache Doris’ native solution for semi-structured analytics

  • Comparison with Snowflake, Iceberg (VARIANT type), and Elasticsearch

  • Real-world applications in Log Analytics, Distributed Tracing, and IoT

Speaker Bio

Owen Xiao is a co-founder of VeloDB and a PMC member of Apache Doris, where he leads product strategy, observability, and AI-driven R&D for both open-source and enterprise data platforms. With over 10 years of experience in database kernel development and distributed systems architecture, he has helped scale analytical databases for global enterprises.

Location
Adobe Founders Tower
333 W San Fernando St, San Jose, CA 95113, USA
You may park in the underground parking, just tell the gate security folk that your host is Aravind Sriram and you're there for South Bay systems. Then head to the lobby, register with reception, and wait in the lobby area for an Adobe employee to escort you to the talk area.
Avatar for South Bay Systems
Presented by
South Bay Systems
Systems meetup in the South Bay Area
149 Going