BioML Seminar 3.6 - Genomic language modeling for context-aware discovery
[IN PERSON EVENT IN BERKELEY]
Join us for a new seminar from the BioML group in Machine Learning at Berkeley, sponsored by Amplify Partners. This week, we're excited to host Dr Yunha Hwang!
Abstract:
The vast majority of sequenced genes are functionally uncharacterized and this “functional dark matter” continues to grow exponentially with metagenomic sequencing. Addressing this challenge requires new sequence understanding methods that go beyond similarity-based analysis of protein sequences and structures. Genomic language modeling presents a scalable method of incorporating genomic context—an often overlooked but critical axis of information—to extract evolutionary patterns pertaining to biological function. The genomic language model, trained on billions of metagenomic sequences, demonstrates potential for context-aware discovery of gene functions and design of multi-protein systems.
Bio:
Yunha Hwang is an Assistant Professor at MIT with a shared appointment between Biology, EECS and the Schwarzman College of Computing. She is also a Co-founder and Chief Scientist at Tatta Bio, a scientific nonprofit dedicated to advancing genomic AI for biological discovery. She completed her Ph.D. in Biology from Harvard University and B.S. in Computer Science from Stanford University. Her research interests span machine learning for sustainable biomanufacturing, microbial evolution, and open science.