

NEXT.BIO 2025 - Advancing AI-driven ingredients in Cosmetics
In October 2025, A16Z #SFTechWeek will once again take over San Francisco. We are organizing NEXT.BIO, a world-class week long tech conference featuring the best startups, industry leaders and investors in AI x BIO across a series of Tech Events, a unique Hackathon and a DEMO Day 🚀
This event is one of our Tech Event series part of the full-week conference.
🧩 Suggested Themes & Challenges:
Offer 3–4 focused tracks, drawing inspiration from existing hackathons like the Harvard Rare Disease Hackathon HARVARD RARE DISEASE HACKATHON:
Genomic Diagnostics in Dermatology
Create tools to analyze variants for skin disease diagnosis.
Spatial & Single-Cell Skin Omics
Utilize tools like Tangram, CellChat, spatialLIBD to explore skin spatial transcriptomics skincenter.uci.edu.
AI for Skin Lesion Analysis
Build apps or models for lesion classification, inspired by Skinskan’s melanoma detection model GitHub.
Ethics, Equity & Data Privacy in Skin Genomics
Address biases in genomic datasets or explore the ethics and privacy implications in skin genetics analysis.
Each track can offer 2–3 problem prompts, e.g.,: "Build a variant interpretation interface for vitiligo-related genes," or "Train a lightweight classifier for early detection of melanoma."
What is are the skin genomics available datasets?
1. Image-based Skin Datasets (Dermatologic Images)
These are widely used for computer vision, diagnostics, and ML model development:
ISIC Archive
A large and rich collection of dermatoscopic images incorporating multiple datasets: HAM10000 (10,015 images), BCN20000 (~19,424), Patient-Centric (~33,126), and others—totaling over 76,000 public images as of May 2024 arXiv.DERM12345
Offers 12,345 high-resolution dermatoscopic images with 38 carefully annotated subclasses, enhancing granularity beyond traditional datasets arXiv.HAM10000
A classic benchmark with 10,015 dermatoscopic images covering seven major diagnostic categories, many with pathology-confirmed labels arXiv.BCN20000
Comprising ~19,424 dermoscopic images including challenging lesion types (nails, mucosal, large lesions) from a real clinical environment arXiv.PAD-UFES-20
Contains 2,298 clinical (smartphone) images representing six diagnostic categories, with 58% biopsy-confirmed lesions arXiv. Great for real-world, non-dermoscopic scenarios.Hyper-Skin
A hyperspectral dataset (330 hyperspectral cubes from 51 subjects), enabling advanced research on skin reflectance, melanin, and hemoglobin – paired with synthetic RGB images arXiv.SCIN (Skin Condition Image Network)
Google’s crowd-sourced dataset: over 10,000 images from volunteer contributions, annotated with dermatologist labels, self-reported Fitzpatrick skin types, and Monk Skin Tone estimates. A valuable dataset for diversity and metadata-rich analysis GitHub.Others:
DermaMNIST, SD-198, Fitzpatrick17k, Derm7pt, etc., are summarized in an “awesome‐skin‐image‐analysis‐datasets” GitHub list GitHub.
2. Molecular & Genomic Skin Datasets
Less common but critical for understanding gene expression, proteomics, and cellular composition in skin:
GTEx (Genotype-Tissue Expression project)
A foundational resource with gene expression (plus genotypes, QTLs, histology) across 54 nondiseased human tissue sites, including skin PMC.Human Skinatlas
A spatially resolved quantitative proteomics atlas created with flow cytometry and mass spectrometry, detailing different skin layers and cell types in healthy human skin PMC.Expression Atlas (EMBL-EBI)
Offers standardized gene (and protein) expression data across conditions, including baseline and differential expression—searchable by tissue or disease Wikipedia.Human Protein Atlas
Maps gene and protein expression in 44 human tissues (including skin) at mRNA and protein levels, with single-cell and immunohistochemistry data Wikipedia.FANTOM5
Transcriptome profiling across a large array of human tissues and primary cells via CAGE; may include skin samples among its 1,816 human profiles Wikipedia.Bgee database
Curated gene expression data (RNA-Seq, scRNA-Seq, microarray) across species, used for identifying where genes are expressed under healthy conditions—includes human skin Wikipedia.Single-cell & Multi-omics Resources (e.g. via skinregeneration.org)
Various mouse and human skin scRNA-seq and scATAC-seq datasets, including samples from diseased and regenerative skin, fat cell subsets, eccrine glands, and more skinregeneration.org.Targeted In Situ Panels
E.g., 10x Genomics Xenium Human Skin Gene Expression Panel, targeting 282 genes across skin cell types (keratinocytes, melanocytes, fibroblasts, immune cells) with spatial resolution 10x Genomics+1.