

Fine Tuning Friday - Qwen & WAN 2.2 - Multi Character Video Generation
Welcome to a new series from the Oxen.ai Herd called Fine Tuning Fridays! Each week we will take an open source model and put it head to head against a closed source foundation model on a specialized task.
We will be giving you practical examples with reference code, reference data, model weights, and the end to end infrastructure to reproduce experiments on your own.
The format will be live on Zoom at Friday, 10am PST, similar to our Arxiv Dive series in the past.
This Week
We are creating a fine-tuning pipeline which includes:
1) Segment out the characters in the scene (using DinoV3)
2) Train a Qwen-Image-Edit LoRA for each color to go from mask -> character in correct location
3) Use WAN Image2Video to turn the still frame into a video
Join us if you want to see if we can get consistent generations of Conan O'Brien interviewing Will Smith :)
Have you been fine-tuning video/image models recently? If so join! We'd love to see you work.