

LLM fine-tuning with LoRA & QLoRA
Fine-tuning is one of the highest-leverage moves in applied LLM work, turning a general-purpose base model into something that actually understands your domain, your data, and your task. Fine-tuned open models can outperform prompt-engineered frontier APIs on narrow tasks, and they do it at a fraction of the inference cost, latency, and token spend. With LoRA and QLoRA, you can adapt a strong base model on a single GPU, keeping training cheap and deployment portable.
In this hands-on workshop, we'll fine-tune an open-weight LLM with LoRA and QLoRA on a custom dataset and deploy it behind a simple UI. The whole pipeline runs on Flyte 2/Union, so data prep is cached, training runs are reproducible and recoverable, and the same code scales from a laptop GPU to a multi-node cluster without rewrites. By the end, you'll have a working fine-tuned model and a reusable pipeline you can extend to your next task. A full end-to-end fine-tuning workflow built on infrastructure that scales from your laptop to a production cluster.
What we'll cover
A practical intro to LoRA and QLoRA, and why parameter-efficient fine-tuning changed what's possible on a single GPU
Fine-tuning an open-weight LLM with Hugging Face TRL and PEFT
Orchestrating the pipeline with Flyte 2: cached data prep, GPU-aware training, and durable, reproducible runs at any scale
Deploying the model with a UI, with a path to scaled inference
Patterns for extending to your own fine-tuning problem
What you'll leave with
A fine-tuned LLM trained on a custom dataset with LoRA or QLoRA
A reusable training and deployment pipeline you can adapt to your own data
The knowledge to build and curate datasets for future fine-tuning projects
A portfolio-ready project and a certificate of participation
Who it's for ML engineers and practitioners who want to move past prompt engineering and adapt models to their own data. Whether you're prototyping at work, evaluating infrastructure for a production LLM use case, or building a portfolio project, you'll leave with code you can keep extending.
Hosted by Sage Elliott, AI Engineer at Union.ai