LLM fine-tuning with GRPO

Building AI Together | Union.ai

YouTube

Past Event

Welcome! To join the event, please register below.

About Event

Supervised fine-tuning teaches a model to imitate examples. Reinforcement fine-tuning teaches it to optimize an objective you define, which is how recent open models picked up reasoning, reliable tool use, and consistent output formats. GRPO (Group Relative Policy Optimization) is the method behind a lot of that work, and it is simpler and cheaper to run than the PPO setups that came before it.

GRPO drops the separate value model that makes PPO expensive. For each prompt, it samples a group of completions, scores them with a reward function, and uses the group's own spread to estimate which responses were better than average. You train on prompts plus a reward function instead of a large hand-labeled dataset, and verifiable rewards (did the answer match, did the output parse, did it hit the format) get you a long way without preference annotation.

In this hands-on workshop, we'll fine-tune an open-weight LLM with GRPO using Hugging Face TRL, write reward functions that actually shape behavior, and deploy the result behind a simple UI. The whole pipeline runs on Flyte 2/Union, so data prep is cached, runs are reproducible and recoverable, and the same code scales from a laptop to a multi-node cluster without rewrites.

By the end, you'll have a working GRPO-trained model and a reusable RL pipeline you can point at your next task.

What we'll cover

A practical intro to GRPO
Writing reward functions
Sandboxes for safe code execution during training
Fine-tuning an open-weight LLM with Hugging Face TRL's GRPOTrainer
Orchestrating with Flyte 2: cached data prep, GPU-aware training, and durable, reproducible runs at any scale
Deploying the model with a UI, with a path to scaled inference

What you'll leave with

An LLM fine-tuned with GRPO against a reward function you wrote
A reusable RL training and deployment pipeline you can adapt to your own task
The knowledge to design reward functions and prompt sets for future GRPO projects

Who it's for

ML engineers and practitioners who want to move past prompt engineering and supervised fine-tuning, and shape model behavior with reinforcement learning. Whether you're prototyping at work, evaluating infrastructure for a production use case, or building a portfolio project, you'll leave with code you can keep extending.

Hosted by Sage Elliott, AI Engineer at Union.ai

Presented by

Building AI Together | Union.ai

Hosted By

144 Went

AI