AI x Variant Interpretation
Technical talks on engineering challenges and interesting problems at the intersection of AI and genomic variant interpretation.
Talks cover sequence-to-function models for predicting molecular traits, personalized gene expression from individual genomes, deep learning for variant effect prediction, and benchmarks + evaluations for frontier models in genomics.
We'll hear from the following:
Ruchir Rastogi — Postdoctoral Scholar @ Kundaje Lab, Stanford
Using sequence-to-function models to predict personal molecular traits
Sequence-to-function models like Enformer and AlphaGenome predict molecular readouts like transcription factor binding, chromatin accessibility, gene expression directly from DNA sequence, making them promising tools for predicting how mutations affect those traits. But work from our group and others shows they aren't yet accurate enough to explain expression differences between individuals from personal mutations: current models underperform simple linear baselines and are sometimes strongly negatively correlated with measured data. We give an overview of potential causes and propose fixes on both the data and modeling fronts.
Shiron Drusinsky — Bioinformatics PhD Candidate, Pollard Lab @ UCSF/Gladstone
Challenges in deep learning prediction of gene expression from genetic variants and personal genomes
In a deeper dive into one particular failure mode of sequence to function models, we examine why sequence to function models struggle to predict expression differences between individuals, even when fine-tuned on personal genomes. We fine-tuned models on data with controlled, artificial genome edits to expose where the learning problem is. We show that these models cannot reliably learn enhancer-gene links from genetic variations, and human populations don't contain enough regulatory variation to teach them. We propose that perturbation-style datasets are a possible path forward.
Sayan Ghosal — Senior Research Scientist, AI/ML @ Chan Zuckerberg Initiative
VariantFormer: A hierarchical transformer integrating DNA sequences with genetic variations and regulatory landscapes for personalized gene expression prediction
Existing population-genetics and sequence-based models either fail to generalize to unseen variants or can't capture mutation interactions and tissue-specific regulation. We present VariantFormer, a hierarchical transformer that predicts personalized gene expression from individual genomes with mutation-aware sequence encoders, regulatory cross-attention, and tissue-conditioned expression prediction across local and long-range windows. We show that VariantFormer predicts expression in unseen donors, captures germline and somatic variant effects, and learns interpretable regulatory grammar.
Esther Robb — CS PhD Candidate, Montgomery Lab @ Stanford
Mapping gene-by-exercise effects in a human acute exercise cohort using deep learning and multi-omics
Exercise lowers the risk of cardiometabolic disease, autoimmune conditions, and all-cause mortality, and response traits like VO2 capacity are highly heritable — yet how genetics shapes exercise response is poorly understood. Using MoTrPAC multi-omics data from 174 participants during and after acute exercise, we mapped gene-by-exercise effects on regulation and expression and linked them to traits. We built a response-QTL model and deep-learning variant effect predictor on skeletal muscle ATAC-seq, nominating thousands of exercise-dependent variants overlapping with existing GWAS datasets. Our work demonstrates a simple and interpretable framework for interrogating gene-by-environment variants across the genome.
Agenda
6:00 – 6:30 Meet others. Eat + drink.
6:30 – 8:00 Talks + Q&A
8:00 – 9:00 Socialize
