Data Engineering for Growth Analytics — Part 2: Attribution Modeling with GA4 Data
In Part 1, we threw out the idea of rebuilding the GA4 interface in SQL. Instead, we designed what actually matters: clean anonymous entities and structured touchpoints from raw event data. All on the whiteboard, no slides.
Now here's where it gets interesting. In Part 2, we take those building blocks and answer the question everyone actually cares about: where are my conversions coming from?
Here's the thing — most attribution setups I see in the wild are either blindly trusting Google's default channel grouping or building overly complex multi-touch models that nobody on the marketing team understands. Both miss the point.
What we'll design together:
A channel classification model that you actually control — no black-box defaults
First-touch and last-touch attribution from the touchpoint model we built in Part 1
Why the attribution waterfall matters: click IDs > UTMs > referrer > direct — and how to think about that priority
Where GA4's built-in session scoping breaks down — and when you need to roll your own
When multi-touch makes sense and when it's just noise
Same format as Part 1: mostly whiteboard design with some SQL to make it concrete. The goal is that you walk away understanding the architecture — generating the queries from there is the easy part.
You don't need to have attended Part 1 to follow along, but it helps. The entity and touchpoint models from that session are the foundation we'll build on here.