

Lakehouse at Scale: Doris x OLake
Lakehouse at Scale
Apache Iceberg adoption is accelerating, and with it come two operational realities data teams are running into head-on: table maintenance at scale and the demand for real-time, accurate retrieval powering AI systems.
This meetup brings together practitioners and contributors working at both ends of that problem. Expect technical depth, real production context, and open discussion with engineers who are actively building and operating lakehouse infrastructure.
Agenda
11:00 - 11:30 | Registration and Welcome
11:30 - 12:10 | How Apache Doris Powers AI Agents with Hybrid Search and Real-Time Analytics Matt Yi, Apache Doris PMC Member, Tech VP at VeloDB
Why single-method retrieval (vector-only or keyword-only) breaks down in production AI systems
Hybrid search architecture: combining vector search, full-text search, and SQL for accurate, intent-aware retrieval
How Apache Doris's native real-time OLAP capability extends into real-time RAG pipelines
Cost and accuracy tradeoffs across retrieval strategies and what that means for context engineering at scale
12:10 - 12:50 | OLake Fusion: Solving Apache Iceberg Table Maintenance Problems at High Scale Ankit Sharma, Tech Lead + Badal Prasad Singh, Software Engineer, OLake
Why continuous CDC ingestion at scale creates small file accumulation and query performance degradation in Iceberg tables
Compaction strategies (lite, medium, full) and how to choose the right mode based on workload and file size targets
Cron-based scheduling, table enable/disable controls via Helm and Docker Compose
Multi-catalog support and lessons from building maintenance systems that do not interrupt live ingestion
12:50 - 1:00 | Break
1:00 - 1:30 | Apache Doris User Sharing Nilanjan Sarkar
Production experience taking Apache Doris from evaluation to live deployment
Practical challenges and decisions made along the way
1:30 | Lunch and Networking