Cover Image for Build Your Own GPT-2
Cover Image for Build Your Own GPT-2
Avatar for AI Safety Poland
Presented by
AI Safety Poland
AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.
33 Going

Build Your Own GPT-2

Register to See Address
Warszawa, Województwo mazowieckie
Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

AI Safety Poland, together with the Artificial Intelligence Society Golem at Warsaw University of Technology, is excited to invite you to a hands-on, half-day workshop where we’ll build GPT-2 from scratch!

​💁 Topic: Build Your Own GPT-2
📣 Teacher: David Quarel
🇬🇧 Language: English
🗓️ Date: 16.12.2025, 16:00-20:00
📍 Location: Warsaw University of Technology

About You

This workshop is for people who:

  • Are very comfortable writing and debugging Python.

  • Have some familiarity with PyTorch or Numpy (we will be using PyTorch).

  • Some ML background is nice, but is not required. It's sufficient to know that a neural network is approximately "a bunch of matrix multiplications with non-linear activation functions in between".

  • Have ~4 hours to spare before the workshop on the prerequisites (may be less, depending on background).

If this isn't you, you'll likely struggle and not enjoy the workshop.

Learning Objectives

  • Understand what a transformer is, and how it is used.

  • Learn what tokenization is on a high level.

  • Understand the causal attention mechanism in transformers, and how to construct it by hand.

  • Understand what logits are, and how to use them to derive a probability distribution over the vocabulary.

After today, you will understand what exactly happens when you interact with large language models like GPT-2. We hide nothing*, no magic!

*Due to time constraints we do not implement the tokenizer. Andrej Karpathy has videos on this if you're curious, but we will black-box this (and only this) for today. It's just a look-up table from "words" (tokens) to numbers, so it's not that interesting, and isn't core to understanding how the transformer works.

About the Teacher

David Quarel is a PhD student at the Australian National University (ANU), Canberra, Australia, working on AI safety at the London Initiative for Safe AI (LISA), as well as a teaching assistant for ARENA. David has years of experience as a teacher, developing content both for courses at the ANU and for ARENA. He recently co-authored a new textbook on Universal Artificial Intelligence. He now works as a research scientist at Timaeus.

Before the Event

Depending on your background, you may be able to skip bits and pieces. We estimate there would be about ~4 hours of content if you were to do all of it. Prioritise the Einops exercises, as you will not be able to build GPT-2 without it.

Prerequisite Material

1) Watch the following 3Blue1Brown videos.

Don't worry if you struggle with the attention video, understanding and constructing the attention mechanism is the main goal of the workshop.

Don't worry about backpropagation, we will not be training any models. It's enough to understand the goal the optimiser has, and that it somehow adjusts the weights to minimise the loss function.

2) Skim the background material, especially Neural Networks and Linear Algebra sections. This material is for the entire ARENA course and while you won't need all the materials for the workshop, it still might be useful reference for you.

3) Watch the video "Einsum Is All You Need" that provides a good intro to the einsum library and attempt the prereq exercises under "Einops, Einsum & tensors".

Don't worry if you can't get through all the exercises! Getting a feel for reduce, reshape and einsum is the main objective here.

Location
Please register to see the exact location of this event.
Warszawa, Województwo mazowieckie
Avatar for AI Safety Poland
Presented by
AI Safety Poland
AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.
33 Going