BioML Seminar 4.4 - Scaling Perturbation-Trained Single-Cell Foundation Models to 3 Billion Parameters
[IN PERSON EVENT IN BERKELEY]
Join us for a new seminar from the BioML group in Machine Learning at Berkeley, sponsored by Amplify Partners. This week, we're excited to host Shreshth Gandhi, Director of Machine Learning at Tahoe Bio!
Talk Abstract:
Understanding how cells respond to drugs is fundamental to discovering new therapeutics, but experimentally mapping every drug-cell combination is intractable. Foundation models offer a path forward: pretrain on massive, diverse datasets and generalize to unseen settings. In this talk, I will present Tahoe-x1, a family of single-cell foundation models scaled to 3 billion parameters and pretrained on a corpus of 250M cells, including Tahoe-100M the largest single-cell perturbation dataset to date. I will discuss how training on 100 million perturbation profiles across 50 cancer cell lines and 1,100 compounds allows these models to predict drug effects in cellular contexts never seen during training — and what this means for accelerating drug discovery in data-limited oncology settings. I will also cover practical lessons from scaling transformers on single-cell data and share our perspective on what's needed to build toward a virtual cell.
Speaker Bio:
Shreshth Gandhi is the Director of Machine Learning at Tahoe Bio, where he works on foundation models for single-cell genomics and drug discovery. He is the first author of Tahoe-x1. Previously, he spent six years at Deep Genomics, where he contributed to BigRNA, a foundation model for RNA biology. He holds an M.A.Sc. in ECE from the University of Toronto, where his thesis focused on ML for genomics, and a B.Tech. in Electrical Engineering from IIT Kanpur.