

Diffusion Model Meetup & Paper Reading — Attention is All You Need & Transformers Model Architecture
TL;DR
In this session, we’ll walk through one of the most important papers in modern AI — Attention Is All You Need — the 2017 paper behind the transformer architecture that powers models like ChatGPT.
We’ll break down the paper step by step — understanding what “attention” means, why it changed the field, and how transformers are built. No code. Minimal math. Just a clear, intuitive walkthrough of the ideas and architecture that reshaped machine learning.
This session is part of our ongoing Diffusion Model Paper Reading Group, a friendly, online community across NY, SF, Toronto, and Boston — open to anyone curious about AI.
👌Learning Requirements
You’ll be fine as long as you’re:
Curious about how transformer models actually work
Comfortable skimming a paper and engaging in discussion
Open to learning visually and conceptually (no coding or deep math required)
🗓 Schedule
First 60 min:
We’ll walk through the “Attention Is All You Need” paper — focusing on:
The motivation and innovation behind the transformer
What “attention” really means in plain language
Key building blocks: self-attention, multi-head attention, positional encoding, and residual connections
How these concepts form the backbone of today’s GenAI models
Second 30 min:
Open discussion and Q&A — a space to clarify what still feels fuzzy and prepare together for next week’s session on Diffusion Transformers.
If you’re planning to attend next week’s Diffusion Transformers session, register here:
https://luma.com/lr2qvveq
📚 Pre-Class Learning
📄 Paper: Attention Is All You Need
https://papiers.ai/1706.03762
Pick one video based on your level of curiosity:
Easy: 3Blue1Brown – Attention in Transformers, Step-by-Step (26 min)
Medium: Yannic Kilcher – Attention Is All You Need (27 min)
Advanced: Andrej Karpathy – Stanford CS25: Introduction to Transformers (1 hr 11 min)
👥 Speakers
Led by master’s and PhD students in AI, IBM AI consultants, and CTOs of award-winning AI startups — all experienced in helping learners deeply understand transformer architecture.
The highlight of this session is clarity — by the end, you’ll understand how and why transformers work, once and for all.
🧠 About the Diffusion Model Reading Group & Bootcamp
A peer-led, 5‑month learning journey for engineers, students, researchers, and builders exploring diffusion model architectures and modern AI.
No ML background required — just curiosity
2–4 hours/week with paper readings, discussions, and final projects
Supportive community made with people who are in the industry