

NEXT.BIO 2025 - Enhancing food production through AI driven plant genomics
In October 2025, A16Z #SFTechWeek will once again take over San Francisco. We are organizing NEXT.BIO, a world-class week long tech conference featuring the best startups, industry leaders and investors in AI x BIO across a series of Tech Events, a unique Hackathon and a DEMO Day 🚀
This event is one of our Tech Event series part of the full-week conference.
🧩 Suggested Themes & Challenges:
Genomic Data Analysis
Analyze plant genome sequences to identify genes linked to traits like drought resistance or disease susceptibility.AI in Plant Phenotyping
Develop machine learning models to predict plant traits from genomic data or images.CRISPR Design Tools
Create software to design CRISPR guides for targeted gene editing in plants.Sustainable Agriculture Solutions
Use genomic insights to propose methods for improving crop yield and resilience.Educational Tools
Design interactive platforms to teach plant genomics concepts to students and the public.
What is are the plants genomics available datasets?
Leading Genomic & Multi-Omics Datasets:
1001 Genomes Project (Arabidopsis thaliana):
Offers whole-genome sequencing (WGS) data for 1,135 natural inbred accessions of Arabidopsis thaliana, with VCFs, SNP matrices, and pseudogenomes available for download. 1001genomes.org1001genomes.org
Includes epigenetic (methylome) and transcriptomic data: ~1,107 methylomes and ~1,203 transcriptomes from overlapping accessions. PubMedviennabiocenter.orgCell
This makes it a standout dataset for exploring genome–epigenome–transcriptome interplays. viennabiocenter.orgCellPubMedNature
Community threads confirm:
“The Arabidopsis 1001 genomes project contains both WGBS and WGS data for many of their accessions.” Reddit
AGP: Arabidopsis Genomics-Phenomics Dataset:
A cutting-edge, multi-modal dataset released in August 2025 that for the first time integrates gene expression profiles with phenotypic trait measurements for the same Arabidopsis thaliana specimens.
Designed especially for machine learning tasks, like phenotype prediction and graph-based modeling. arXiv
Foundational Databases for Comparative & Crop Genomics
Phytozome: A comprehensive repository of plant genomic data across numerous green plant species—excellent for comparative genomics with integrated annotations. cd-genomics.com
Ensembl Plants (part of Ensembl Genomes): Provides genome sequences and annotated data for dozens of key plant species such as Arabidopsis, rice, maize, wheat, grape, and more, all accessible through tools like BioMart. Wikipediaplants.ensembl.org
Species‐Specific Databases:
TAIR (Arabidopsis): Genome annotations, gene function, variants, phenotypes, and seed stock info. Public data is released after one year; subscriptions required for immediate access. Wikipedia
SoyBase: In-depth SNP, genetic maps, QTL, and functional genomic data for soybean (Glycine max). Wikipedia
SolGenomics Network, CassavaBase, CucurbitDB, YamBase, etc.: Tailored for specific crop families like nightshade, cucurbits, banana, radish, etc. Boyce Thompson Institute
PlantPan (Pan-Genomics): A pan-genome database that spans 195 genomes in 11 plant species, with extensive gene clusters, variance data, synteny, functional annotations (GO, KEGG, TFs, etc.). PMC
PGP Repository (Plant Phenomics and Genomics): A German EU-funded infrastructure allowing publication and long-term DOI citable storage of large-scale multi-domain plant genomics and phenomics data, fully compliant with FAIR principles. Wikipedia+1
Earth BioGenome Project / 10KP Pilot: Ambitious planetary-scale effort: as of late 2024, 2,000+ plant species have chromosome-level assemblies. These ecosystems-scale datasets pave the way for massive comparative studies.