

NEXT.BIO 2025 - Reinventing pigments and dyes through AI
In October 2025, A16Z #SFTechWeek will once again take over San Francisco. We are organizing NEXT.BIO, a world-class week long tech conference featuring the best startups, industry leaders and investors in AI x BIO across a series of Tech Events, a unique Hackathon and a DEMO Day 🚀
This event is one of our Tech Event series part of the full-week conference.
🧩 Suggested Themes & Challenges:
What is are the biomaterials available datasets?
1. Experimental / Properties Data
Biomolecular Adsorption Database 2.0 (BAD2.0)
A specialized database capturing protein adsorption onto biomaterial surfaces, with detailed descriptors like protein identity, surface properties, and environmental conditions. Over 865 adsorption records, fully quantitative and update from 2024. WikipediaGlobal Clinical Trials Dataset Involving Engineered Biomaterials
A structured dataset created from ClinicalTrials.gov, containing metadata such as biomaterial type, properties, application context, and trial attributes. Great for analysis of translational trends. ZenodoFTIR & Raman Datasets for Biomaterials with Drug Loading
Specialized spectral datasets focusing on biomaterials after loading with carboplatin drug—helpful for chemical and structural fingerprinting. Figshare+1GelMA/Alginate Hydrogels Fabrication Dataset
Published on Figshare in early 2025, this dataset pertains to blood vessel fabrication using hydrogels, with relevance to 3D biofabrication and property optimization. Figshare
2. Text/NLP-Focused Corpora for Biomaterials Literature
Developed under the Horizon Europe BIOMATDB initiative, these corpora are invaluable if you're building NLP tools for extracting biomaterial-related information from scientific papers.
BIOMAT-NER: Contains annotations of biomaterial types, chemical substances, and trade names across ~4,553 training documents, plus validation and test sets. Zenodo+1
BIOMAT-MONER: Focused on "manufactured object" entities—tools, devices, implants—with 750 train and 100 validation documents. Zenodo
BIOMAT-CellNER: Annotated for cell types and cell lines interacting with biomaterials, similar train/validation size. Zenodo
BIOMAT-AnatNER: Focused on anatomical structures (tissues, organs) in biomaterials context. Zenodo
3. Computational & Structural Materials Data
Open Materials 2024 (OMat24)
A massive open dataset with over 110 million DFT-calculated inorganic material structures. It’s paired with pre-trained models (EquiformerV2) capable of predicting properties like stability and formation energy—ideal for ML-driven materials discovery. arXivReddit"Awesome-Biomaterials-DataScience" GitHub List
A living resource aggregating databases and tools—including DEBBIE (Experimental Biomaterials), MatWeb, Spider Silkome database, and more. Not a single dataset, but a goldmine for discovering resources. GitHub