Cover Image for Lakehouse at Scale: Doris x OLake

Presented by

We organise community events and webinars surrounding Data enginnering topics like CDC, Apache Iceberg, ETL from Database to Data Lakehouses

Hosted By

152 Went

AI

Lakehouse at Scale: Doris x OLake

Name: Lakehouse at Scale: Doris x OLake
Start: 2026-05-09T11:00:00.000+05:30
End: 2026-05-09T14:00:00.000+05:30
Location: Bengaluru, India

OLake Community Events

Register to See Address

Bengaluru, India

Registration Closed

This event is not currently taking registrations. You may contact the host or subscribe to receive updates.

About Event

Lakehouse at Scale

Apache Iceberg adoption is accelerating, and with it come two operational realities data teams are running into head-on: table maintenance at scale and the demand for real-time, accurate retrieval powering AI systems.

This meetup brings together practitioners and contributors working at both ends of that problem. Expect technical depth, real production context, and open discussion with engineers who are actively building and operating lakehouse infrastructure.

Agenda

11:00 - 11:30 | Registration and Welcome

11:30 - 12:10 | How Apache Doris Powers AI Agents with Hybrid Search and Real-Time Analytics Matt Yi, Apache Doris PMC Member, Tech VP at VeloDB

Why single-method retrieval (vector-only or keyword-only) breaks down in production AI systems
Hybrid search architecture: combining vector search, full-text search, and SQL for accurate, intent-aware retrieval
How Apache Doris's native real-time OLAP capability extends into real-time RAG pipelines
Cost and accuracy tradeoffs across retrieval strategies and what that means for context engineering at scale

12:10 - 12:50 | OLake Fusion: Solving Apache Iceberg Table Maintenance Problems at High Scale Ankit Sharma, Tech Lead + Badal Prasad Singh, Software Engineer, OLake

Why continuous CDC ingestion at scale creates small file accumulation and query performance degradation in Iceberg tables
Compaction strategies (lite, medium, full) and how to choose the right mode based on workload and file size targets
Cron-based scheduling, table enable/disable controls via Helm and Docker Compose
Multi-catalog support and lessons from building maintenance systems that do not interrupt live ingestion

12:50 - 1:00 | Break

1:00 - 1:30 | Apache Doris User Sharing Nilanjan Sarkar

Production experience taking Apache Doris from evaluation to live deployment
Practical challenges and decisions made along the way

1:30 | Snacks & Networking

Location

Please register to see the exact location of this event.