Wednesday, April 22, 2026

Datapages for reusable (and pretty!) data sharing

[This post is joint with Mika Braginsky, a long-time collaborator on data-sharing and data-viz.]

Data sharing is both a critical scientific need and, increasingly, a mandate by many research funders. The FAIR principles – that data should be findable, accessible, interoperable, and reusable – are a critical guide to how data are shared. Yet even FAIR-compliant datasets in approved repositories are often shared in ad-hoc formats that are hard to reuse or to integrate with other data. In contrast, the most impactful datasets tend to be disseminated thoughtfully through dataset-specific or community-specific platforms. These “domain-specific data repositories” (this was our term from a previous blogpost!) create opportunities for creating data standards and ontologies that fit the needs of a particular community, research problem, or instrument type. They also allow opportunities for engagement through interactive visualizations. But custom repositories and pretty websites with nice visualizations are costly and complicated to create.

We are introducing a set of open-source tools and templates for easily creating datapages, interactive websites to disseminate data for broad reuse. Datapages are easy to deploy for a single project, but extensible enough to host large collections of related datasets. You can learn more and get started at https://datapages.github.io/.

A gallery of datapages that we've built for datasets from psychology, linguistics, economics, education, and atmospheric science (https://datapages.github.io).

The basic idea is that a datapage is an interface to a dataset – a way that users can learn more about it and access it as seamlessly as possible. This functionality is enabled by storing the data in Redivis, a data platform for academic research which provides datasets with versioning, flexible access controls, and multi-platform API access. The datapage itself is a website (built using the Quarto publishing system) that is connected to one or more Redivis datasets and is automatically populated with interactive visualizations, an embedded data browser, documentation on programmatic access, and information on citation, DOI, metadata, and more. An automated deployment pipeline renders this website and publishes it as a static site, so it can be hosted on a serverless platform such as GitHub pages.

Our hope is that a user without much technical expertise (or who just wants the default use case) can create a datapage for their dataset without writing any code or setting up any deployment infrastructure. More advanced users can customize their datapage with custom visualizations, theming, and any content they want to add.

We've created a diverse array of datapages for datasets from psychology, linguistics, economics, education, and atmospheric science. The template datapage uses Palmer Penguins (https://allisonhorst.github.io/palmerpenguins), a dataset often used as an example for visualization. A few of these datapages represent domain-specific data repositories, i.e. collections of harmonized data from multiple sources, namely the Item Response Warehouse (item responses for psychometrics modeling), Refbank (conversations in referential communication tasks), and Numberbank (experimental results on children's numerical language). Others provide interfaces for complex single-source datasets: Project Loon (weather balloon flight paths) and LEVANTE (global data collection on learning and development). And several contain data that accompany a specific publication: Harmonized Learning Outcomes (global education metrics), ManyBabies 1 (multi-site replication of infant-directed speech preference), and State Incentives Map (disclosure laws about state subsidy programs).

We are excited to share this project, and hope that people will find new and creative ways to use this toolset to disseminate their data. Don't hesitate to get in touch if you're interested in talking more about the project.

No comments:

Post a Comment