Cover Image for Systems Reading Group with Arc Institute, LatchBio + FutureHouse
Cover Image for Systems Reading Group with Arc Institute, LatchBio + FutureHouse
164 Went

Systems Reading Group with Arc Institute, LatchBio + FutureHouse

Hosted by Kenny Workman & Shreya Shekhar
Register to See Address
San Francisco, California
Registration
Past Event
Please click on the button below to join the waitlist. You will be notified if additional spots become available.
About Event

​The intersection of computing and engineering biology is a playground for systems: operating systems, file systems, virtualization, programming languages, databases, compilers, fuzzers, distributed systems, etc.

In this biotech flavored version of the SF systems reading group we'll hear from three awesome speakers who will walk through design decisions, paper highlights + snippets of source code:

  • Aidan Abdulali | LatchBio: A Distributed Filesystem Built on Postgres and S3

  • Noam Teyssier | Arc Institute: BINSEQ: A Family of High-Performance Binary Formats for Nucleotide Sequences

  • James Braza | FutureHouse: Edge of Tomorrow Algorithms 

  • Abhinav Adduri | Arc Institute: Scaling Deep Learning to 1B+ Single Cells

Event space provided by LatchBio and Greylock is generously sponsoring food / refreshments.

Important: Our office (Lobby 5) is on the 4th street side of the building. Come in on the river side through the sliding doors or through the lobby on the Berry St. side.

Agenda

  • 5:30 - 6:30 Meet others. Eat + drink.

  • 6:30 - 8:00 Talks + Q&A

  • 8:00 - TBD Socialize

Abstracts

LData: A Distributed Filesystem Built on Postgres and S3
Aidan Abdulali | LatchBio

> LatchBio builds data infrastructure to store, analyze and visualize lorgevolumes of molecular data. A core component of this platform is a distributed file system called LData. This talk walks through its architecture and illustrates how to build a complex distributed system with little more than a database.

Noam Teyssier | Arc Institute: A Family of High-Performance Binary Formats for Nucleotide Sequences

> Modern genomics produces billions of sequencing records per run, which are typically stored as gzip-compressed FASTQ files. While this format is widely used, it is not optimalfor high-throughput processing due to its reliance on single-threaded decompression andsequential parsing of irregularly sized records. Here, we present BINSEQ, a family of simple binary formats that enable high-throughput parallel processing of sequencing data.  We demonstrate that BINSEQ files are up to 32x faster thancompressed FASTQ for parallel processing and can reduce analysis time from hoursto minutes for large-scale genome and transcriptome analyses, particularly for resource intensive applications like alignment, mapping, and de novo assembly.

Edge of Tomorrow Algorithms 
James Braza | FutureHouse

> Imagine you're given a model, a benchmark, and just one day to saturate the benchmark. Normally training the model takes a week, but if you do not succeed in one day, the day resets. This talk is on a progression of algorithms from FutureHouse's aviary and ether0 papers that solve this exact problem, bringing us to the edge of tomorrow.

Scaling Deep Learning to 1B+ Single Cells
Abhinav Adduri | Arc Institute

> Single cell transcriptomics data repositories have experienced dramatic growth in recent years. Similar to how internet-scale data enabled a new intelligence frontier for language models, the wealth of observational and perturbational data being generated will enable cellular models that reveal new biological insights. However, computational tools have not kept pace with the rapid development of single cell assays, presenting challenges in training and evaluating models on these datasets. In this short talk, I’ll describe how we scaled STATE to 300M cells, what avoidable mistakes we made, and what advancements are needed to efficiently scale to 1B+ cells.

Excited to see you guys here and learn a bit more about computers.

Location
Please register to see the exact location of this event.
San Francisco, California
164 Went