Cover Image for BioML Seminar 3.4 - Predicting genome-wide functional constraints with GPN-Star
Cover Image for BioML Seminar 3.4 - Predicting genome-wide functional constraints with GPN-Star
Avatar for BioML @ Berkeley
Presented by
BioML @ Berkeley
Seminar series with researchers and leaders leveraging ML to stay at the cutting edge of biology.
67 Went

BioML Seminar 3.4 - Predicting genome-wide functional constraints with GPN-Star

Register to See Address
Berkeley, California
Registration
Past Event
Welcome! To join the event, please register below.
About Event

​​​[IN PERSON EVENT IN BERKELEY]

​​​​​Join us for a new seminar from the BioML group in Machine Learning at Berkeley, sponsored by Amplify Partners. This week we're hosting Chengzhong Ye, a PhD candidate from Prof Yun Song's lab here at Berkeley.

Abstract:
Genomic language models (gLMs) have emerged as a powerful approach for learning genome-wide functional constraints directly from DNA sequence, yet NLP-style gLMs often demand substantial compute and still lag classical evolutionary models on key tasks. We present GPN-Star (Genomic Pretrained Network with Species Tree and Alignment Representation), a biologically grounded gLM with a phylogeny-aware architecture that integrates whole-genome alignments and species trees to model evolutionary relationships explicitly. Trained on alignments spanning vertebrate, mammalian, and primate timescales, GPN-Star attains state-of-the-art accuracy across diverse variant-effect prediction tasks in both coding and noncoding regions of the human genome. Analyses across timescales reveal task-dependent advantages of emphasizing more recent versus deeper evolutionary signal. In human genetics applications, GPN-Star improves prioritization of pathogenic and fine-mapped GWAS variants, yields strong enrichments of complex-trait heritability, and increases power in rare-variant association testing. Extending beyond humans, we applied GPN-Star for mouse, chicken, fly, worm, and Arabidopsis, demonstrating robustness and generalizability. Overall, GPN-Star provides a scalable and flexible framework for genome interpretation that leverages expanding comparative genomics resources.

Bio:
Chengzhong Ye is a PhD candidate in Yun Song's lab at UC Berkeley. His recent research focuses on developing genomic language models with biologically grounded designs. More broadly, he works on machine learning methods that leverage evolutionary data to study human genetic variation and disease. He has previously worked on protein variant effect prediction and single-cell omics. Prior to joining Berkeley, he received his BS and MD degrees from Tsinghua University.

Location
Please register to see the exact location of this event.
Berkeley, California
Avatar for BioML @ Berkeley
Presented by
BioML @ Berkeley
Seminar series with researchers and leaders leveraging ML to stay at the cutting edge of biology.
67 Went