

Fine-Tune Friday: Qwen 3.5 Goes Multimodal 🎬👀 – Understanding Images & Videos to Detect Actors in Scenes!
​Welcome to this week’s edition of our favorite series from the Oxen.ai Herd, Fine-Tune Fridays. Each week, we take an open-source model and put it head-to-head with a closed-source foundation model on a specialized task.
​​We share practical, end-to-end examples, including reference data, model weights, and the full infrastructure needed to reproduce the experiments on your own.
​​This Week
We’re diving into the new Qwen 3.5 series with a focus on multimodality. We’ll test how fast and capable the smaller models are for image and video tasks, like detecting actors in a scene, and see how the larger models perform across text, image, and video understanding.
​We’ll also walk through our fine-tuning and deployment pipeline and show how you can reproduce the experiments and fine-tune these powerful models yourself using Oxen.ai.
​​Looking forward to seeing you there!