Cover Image for OLake 10th Community Call
Cover Image for OLake 10th Community Call
Avatar for OLake Community Events
We organise community events and webinars surrounding Data enginnering topics like CDC, Apache Iceberg, ETL from Database to Data Lakehouses
Hosted By
52 Went

OLake 10th Community Call

Virtual
Registration
Past Event
Welcome! To join the event, please register below.
About Event

What we’ll cover

New Sources Added

We’ve expanded Olake’s source ecosystem with new, production-ready integrations:

  • S3 Source Integration
    Read data directly from S3-compatible storage in CSV, JSON, and Parquet formats.
    Works with AWS S3, MinIO, and LocalStack, supports IAM-based authentication, and enables flexible file discovery using glob patterns.

  • MsSQL Source
    Native support for Microsoft SQL Server as a source, allowing teams to ingest data from existing MSSQL deployments and write it into Apache Iceberg.

  • DB2 Source
    Enterprise-grade DB2 source support, enabling seamless ingestion from IBM DB2 into Iceberg-backed lakehouse architectures.

For all new sources, documentation has been added so teams can easily plug these into their existing architecture and push data into Apache Iceberg.


MOR → COW Architecture Improvements

Olake ingests CDC data using Merge-on-Read (MOR) with equality deletes. However, many query engines (such as Databricks and Snowflake) do not fully support equality deletes, which can lead to incorrect query results.

To address this, we’ve introduced a MOR to COW compaction script that:

  • Periodically converts MOR tables into Copy-on-Write (COW)

  • Produces clean, query-ready Iceberg tables

  • Uses WAP (Write-Audit-Publish) for atomic checkpointing

  • Supports idempotent re-runs and automatic failure recovery

  • Ensures correctness without sacrificing ingestion performance


Kubernetes & Job Execution Enhancements

We’ve introduced major improvements to job execution and scheduling:

  • Transition from Job Mapping to Job Profiles

  • Zero-based mapping support

  • Full Kubernetes scheduling control using:

    • NodeSelector

    • Tolerations

    • Affinity

  • Backward compatibility with existing job mappings

These changes provide better scalability, flexibility, and control in Kubernetes-based deployments.


Community Highlights

This call also focuses on the people behind Olake:

  • Contributor spotlights and shoutouts

  • Updates from the Social Winter of Code (SWOC) program

  • Recognition of new contributors and their impact

  • Highlights from recent community blogs and company case studies 


Future Events

We’ll close the session by sharing upcoming:

  • Community calls

  • Hackathons and workshops

  • Opportunities to contribute and get involved

Avatar for OLake Community Events
We organise community events and webinars surrounding Data enginnering topics like CDC, Apache Iceberg, ETL from Database to Data Lakehouses
Hosted By
52 Went