Cover Image for Taming Data Pipelines: Scaling Databricks & Linting dbt

Presented by

PyData Amsterdam is a vibrant community of Python and data enthusiasts that brings together this community and provides a forum for users and developers of open-source data tools.

Hosted By

46 Went

Tech

Taming Data Pipelines: Scaling Databricks & Linting dbt

Name: Taming Data Pipelines: Scaling Databricks & Linting dbt
Start: 2026-06-04T18:00:00.000+02:00
End: 2026-06-04T21:30:00.000+02:00
Location: Company.info B.V.

PyData Amsterdam

Company.info B.V.

Amsterdam, Netherlands

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Mark your calendars for our upcoming PyData Amsterdam meetup on Thursday, June 4, 2026! We are heading over to Company.info for an evening dedicated to the gritty reality of scaling data infrastructure and applying rigorous engineering to MLOps.

As data platforms grow, so do the bottlenecks. This edition is all about tackling those growing pains head-on. We are going to look under the hood of high-performing data teams to see how they keep their pipelines fast, reliable, and organized at scale. Expect real-world case studies on slashing integration test times during a complex Databricks migration, alongside a deep dive into enforcing architectural standards across sprawling dbt projects using custom Python-based linters.

If you build, maintain, or scale data pipelines, whether as a Data Engineer, ML Engineer, or Data Scientist, this session will equip you with actionable strategies to optimize your daily workflows and tame architectural chaos.

Ready to level up your engineering toolkit? Join us for an evening packed with hard-earned technical insights, great food, and excellent networking with the Amsterdam data community!

Agenda

18:00 - 18:55: Welcome with food and drinks!
18:55 - 19:00: Intro Company.info
19:00 - 19:45: Talk 1: Scaling our Integration Tests Workflows in Databricks - by Douwe Oosterhout
19:45 - 20:00: Short break
20:00 - 20:45: Talk 2: dbt-bouncer, a linter for dbt projects - by Pádraic Slattery
20:45 - 21:30: Networking + drinks

Talk 1: Scaling our Integration Tests Workflows in Databricks

By Douwe Oosterhout

As the number of workflows grew, so did the time spent waiting for integration tests to finish. What started as a minor inconvenience gradually became a significant bottleneck, directly impacting our team's development velocity and ability to ship changes quickly and confidently.

When it became clear that the status quo was no longer sustainable, we knew we needed a smarter approach; one that could scale alongside our growing codebase without sacrificing reliability. In this talk, we'll walk you through how we dramatically reduced integration test runtimes, making our feedback loops faster and our developers happier. The journey wasn't without its complications, however. Our efforts were made more challenging by the fact that we were simultaneously migrating our traditional workflows to Databricks Asset Bundles, an extra hurdle that forced us to adapt our approach along the way. We'll share how we navigated both challenges at once and what we learned in the process.

Bio: Douwe Oosterhout is Lead Data Engineer at Company.Info. He is the technical team lead of the Dutch Organization team, which processes a significant part of Company.Info's data. Besides building pipelines, he also works on making the data platform more accessible for others.

Talk 2: dbt-bouncer, a linter for dbt projects

By Pádraic Slattery

As dbt projects continue to grow in size and complexity it is becoming increasingly common for each organisation to have its own conventions, standards and ways-of-working. But how can these conventions be maintained as the dbt project continues to expand? Does chaos reign or can we use our engineering knowledge to maintain our conventions? This talk explores the use of dbt-bouncer, an open-source Python CLI that aims to allow dbt developers to encode and configure their own dbt conventions.

Bio: Pádraic Slattery is the Lead Field CTO at Xebia Data working in analytics, data engineering and Business Intelligence development across various industries. He helps clients at Xebia Data deploy data platforms, onboard business use cases, and improve employees' data literacy. His main areas of interest are DataOps and building robust data ingestion pipelines.

Directions

📍 Finding Company.info - Address: Abram Dudok van Heelstraat 2, 1096 BE Amsterdam. Just a 5 min walk from the Overamstel metrostation.

Location