BioML Seminar 3.4 - Predicting genome-wide functional constraints with GPN-Star
[IN PERSON EVENT IN BERKELEY]
Join us for a new seminar from the BioML group in Machine Learning at Berkeley, sponsored by Amplify Partners. This week we're hosting Chengzhong Ye, a PhD candidate from Prof Yun Song's lab here at Berkeley.
Abstract:
Genomic language models (gLMs) have emerged as a powerful approach for learning genome-wide functional constraints directly from DNA sequence, yet NLP-style gLMs often demand substantial compute and still lag classical evolutionary models on key tasks. We present GPN-Star (Genomic Pretrained Network with Species Tree and Alignment Representation), a biologically grounded gLM with a phylogeny-aware architecture that integrates whole-genome alignments and species trees to model evolutionary relationships explicitly. Trained on alignments spanning vertebrate, mammalian, and primate timescales, GPN-Star attains state-of-the-art accuracy across diverse variant-effect prediction tasks in both coding and noncoding regions of the human genome. Analyses across timescales reveal task-dependent advantages of emphasizing more recent versus deeper evolutionary signal. In human genetics applications, GPN-Star improves prioritization of pathogenic and fine-mapped GWAS variants, yields strong enrichments of complex-trait heritability, and increases power in rare-variant association testing. Extending beyond humans, we applied GPN-Star for mouse, chicken, fly, worm, and Arabidopsis, demonstrating robustness and generalizability. Overall, GPN-Star provides a scalable and flexible framework for genome interpretation that leverages expanding comparative genomics resources.
Bio:
Chengzhong Ye is a PhD candidate in Yun Song's lab at UC Berkeley. His recent research focuses on developing genomic language models with biologically grounded designs. More broadly, he works on machine learning methods that leverage evolutionary data to study human genetic variation and disease. He has previously worked on protein variant effect prediction and single-cell omics. Prior to joining Berkeley, he received his BS and MD degrees from Tsinghua University.