

Inside Harvard’s Data.gov Archive – A Conversation with Jack Cushman
Join us for a conversation with Jack Cushman from the Harvard Law School Library Innovation Lab about their new archive of Data.gov—more than 311,000 datasets harvested in 2024–2025, updated daily, and published on Source Cooperative.
We’ll dig into two threads:
- BagIt for durability: How Library of Congress–standard packaging, checksums, and signatures support authenticity, provenance, and long-term citation.
- Discovery without a server: how browser-based querying over static data makes 17.9 TB of datasets findable and fast to explore.
We’ll also talk about practical choices that matter when you’re archiving government data: what to bag, what metadata to preserve, how to track change over time, and how to make it usable for researchers, journalists, and agencies.
Resources
- Announcing the Data.gov Archive
- Welcome to LIL’s Data.gov Archive Search.