On May 20th, the US National Science Foundation issued a new Dear Colleague letter to "describe — and encourage — effective practices for managing research data, including the use of persistent identifiers (IDs) for data and machine-readable data management plans (DMPs)."
Theoretically, the explosion of scientific data being generated across all research fields is a boon for researchers looking to draw on high quality data from various sources. In practice, it is often difficult for researchers to take full advantage of this situation because of inconsistent protocols, policies, and best practices for managing data. As scientific data generation shows no signs of slowing down, the problem becomes increasingly acute. To address these challenges, the principles of FAIR data (Findable, Accessible, Interoperable, Reusable) were first formally described in 2016 by a working group representing academic and industrial institutions, funding organizations, and scientific publishers. Since then, the topic of FAIR data has received increasing attention.
In this context, the NSF's new guidelines will be a very welcome and valuable addition. Focusing specifically on the benefits of persistent identifiers and machine-readable data management plans, the NSF's letter states:
"A benefit of a persistent ID for research data is that the dataset can be cited in a researcher's NSF biographical sketch, as previously noted, as well as in the "results of prior research" section of future grant proposals. Use of persistent IDs confer other long-term benefits as well. For example, information about a dataset can be findable even though the dataset itself is no longer accessible....
When written effectively, DMPs clarify how researchers will effectively disseminate and share research results, data, and associated materials. However, DMPs can also contain complex and/or ambiguous terms that produce uncertainty about the benefits of data management activities. Such ambiguity can produce situations where the DMP does not adequately explain what data will be created or where the data will be deposited.
For this reason, NSF encourages the use of DMP tools, such as EZDMP7 or the DMPTool, to create machine-readable DMPs. The DMP specifies how data will be produced, prepared, curated, and stored. A machine-readable document allows a computer program to interpret the DMP, such as to prepare a data repository for an eventual deposit of a large or complicated dataset."
As the scientific research infrastructure transitions to a data-intensive and ultimately open environment, the need to effectively manage data will becoming a critical pillar. The NSF's new guidelines go a long to supporting this endeavor.