The National Institutes of Health (NIH) has long recognized that the value of biomedical research is amplified when data generated with federal funding are made widely available. Sharing data accelerates scientific discovery, promotes reproducibility, reduces unnecessary duplication of effort, and maximizes the return on public investment. The NIH Data Sharing Policy, first introduced in 2003 and refreshed several times since, establishes a baseline expectation that investigators will make their data accessible to the broader research community, unless a valid reason for restriction exists.
Every new NIH grant application must include a Data Management Plan that addresses the following elements:
Data should be shared no later than the date of publication (or the end of the grant period, whichever occurs first). For large datasets, a reasonable embargo period may be requested (up to 12 months) to allow the primary investigators to publish initial findings.
NIH expects data to be deposited in a trusted, publicly accessible repository that:
Popular repositories include NCBI, Dataverse, Figshare, and disciplinespecific archives such as the BioProject portal.
When data involve privacysensitive human subjects, NIH requires that they be stored in controlledaccess repositories (e.g., dbGaP, the European GenomePhenome Archive). Researchers must submit a data use agreement (DUA) that specifies who may request access and under what conditions.
Good documentation is essential for reuse. The DMP should describe the metadata standards (e.g., MIAME for microarray data, DICOM for imaging) and provide clear data dictionaries, codebooks, and any software needed to interpret the data.
NIH staff may request evidence of compliance (e.g., repository accession numbers) during progress reports or at the time of the final grant closeout. Failure to share data as promised can affect future funding eligibility.
NIH acknowledges that certain circumstances justify limited sharing:
In these cases, investigators must detail the limitation in the DMP and provide a justification that is reviewed by the funding institute.
The policy does not apply to:
Incorporate the DMP into the grant proposal narrative from day one. Identify the appropriate repository, verify its dataformat requirements, and budget for any required storage fees.
Many universities provide data stewardship support, including help with metadata standards, selection of repositories, and compliance monitoring. Leverage these resources to avoid lastminute surprises.
Use tools such as Research Objects or workflow managers (e.g., Snakemake, Nextflow) to generate reproducible pipelines that automatically produce both data and accompanying documentation.
Explain the scientific rationale behind the data collection, any preprocessing steps, and how variables were derived. Future users will appreciate the context.
Choose repositories that guarantee at least a 10year preservation policy. If the chosen repository is disciplinespecific, confirm that it has a sustainability plan.
For controlledaccess datasets, designate a data steward who can review DUA applications, ensure compliance with consent terms, and keep records of who accessed the data and when.
