Effective Practices for Data from the NSF

The National Science Foundation issues new guidelines for managing research data

Like Comment
Read the Article

On May 20th, the US National Science Foundation issued a new Dear Colleague letter to "describe — and encourage — effective practices for managing research data, including the use of persistent identifiers (IDs) for data and machine-readable data management plans (DMPs)."

Theoretically, the explosion of scientific data being generated across all research fields is a boon for researchers looking to draw on high quality data from various sources. In practice, it is often difficult for researchers to take full advantage of this situation because of inconsistent protocols, policies, and best practices for managing data. As scientific data generation shows no signs of slowing down, the problem becomes increasingly acute. To address these challenges, the principles of FAIR data (Findable, Accessible, Interoperable, Reusable) were first formally described in 2016 by a working group representing academic and industrial institutions, funding organizations, and scientific publishers. Since then, the topic of FAIR data has received increasing attention.

In this context, the NSF's new guidelines will be a very welcome and valuable addition. Focusing specifically on the benefits of persistent identifiers and machine-readable data management plans, the NSF's letter states:

"A benefit of a persistent ID for research data is that the dataset can be cited in a researcher's NSF biographical sketch, as previously noted, as well as in the "results of prior research" section of future grant proposals. Use of persistent IDs confer other long-term benefits as well. For example, information about a dataset can be findable even though the dataset itself is no longer accessible....

When written effectively, DMPs clarify how researchers will effectively disseminate and share research results, data, and associated materials. However, DMPs can also contain complex and/or ambiguous terms that produce uncertainty about the benefits of data management activities. Such ambiguity can produce situations where the DMP does not adequately explain what data will be created or where the data will be deposited.

For this reason, NSF encourages the use of DMP tools, such as EZDMP7 or the DMPTool, to create machine-readable DMPs. The DMP specifies how data will be produced, prepared, curated, and stored. A machine-readable document allows a computer program to interpret the DMP, such as to prepare a data repository for an eventual deposit of a large or complicated dataset."

As the scientific research infrastructure transitions to a data-intensive and ultimately open environment, the need to effectively manage data will becoming a critical pillar. The NSF's new guidelines go a long to supporting this endeavor. 

Robin Padilla

Director of Product Management, Springer Nature

I'm the product director for Springer Nature Experiments, the specialized platform to find, evaluate, and implement lab protocols and methods. I'm a chemist by training, with a Ph.D. from Berkeley and postdoctoral experience at BASF's Catalysis Research Lab at the University of Heidelberg. Transitioning to publishing, I worked in editorial roles at various publishers before joining the product management team in Springer Nature's Data and Analytics Solutions group.
13 Contributions
3 Following