

InstaFormer: Holistic Order Prediction in Natural Scenes
Hosted by: AER LABS
Date: Saturday, December 13, 2025
Time: 12:00 PM - 13:00 PM SG
Location: Network School Library
🌿 About the Event
Understanding instance-wise geometries in controlled settings is already a challenging task for many visual models. In the wild, this becomes even harder.
While specialized systems exist, modern state-of-the-art approaches often rely on expensive input formats (like category labels or binary segmentation masks) and suffer from high inference costs (requiring a quadratic number of forward passes).
In this talk, Pierre Musacchio (Ph.D. Student, SNU) will present InstaFormer, a novel network designed to mitigate these limitations through holistic order prediction.
Key Takeaways:
Single Forward Pass: Unlike previous methods, InstaFormer returns full occlusion and depth orderings for all instances in a scene from solely an input RGB image.
Novel Architecture: Learn how the network leverages interactions between object queries and latent mask descriptors to semantically represent objects while carrying complementary geometric info.
Efficiency: Discover how this approach achieves superior results with a favorable inference-memory trade-off.
Future Applications: Discussion on how InstaFormer opens the door to zero-shot geometric understanding using Vision-Language Models (VLMs).
🎤 About the Speaker
Pierre Musacchio
Ph.D. Student, Computer Science & Engineering, Seoul National University
Pierre is a French Ph.D. student at the VGI Lab (Visual & Geometric Intelligence Lab) at Seoul National University. Previously, he was a member of the CV Lab at POSTECH.
His research focuses on instance-wise understanding and fundamental challenges in computer vision. He is passionate about geometric deep learning and is actively seeking collaboration with researchers interested in pushing the boundaries of scene understanding.