Here there be software

Software plays an important role in scientific research, and published studies increasingly rely on custom software code developed by authors. This calls for better transparency in research articles and improved access to the software and code itself.

This month in Nature Methods and on methagora we revisit issues regarding software reporting and availability first raised exactly seven years ago in our March 2007 Editorial “Social software“. Our March 2014 Editorial updates and expands on these editorial policies and a blog post provides details of our guidelines for custom algorithms and software reported in Nature Methods research papers. We encourage researchers to read these, particularly those considering submitting a research manuscript using or reporting custom software to us. We also hope that publicizing our editorial policies might aid other journals in thinking about how to handle algorithms and software associated with research they publish.

Of course, these efforts are only one small part of what needs to be done to improve access to and use of scientific research software. As can be seen by our somewhat complex guidelines, it is difficult to establish simple rules that are sensible and fair for all cases and all communities. Community participation will be essential for refining and improving how software is handled.

Nature Methods currently relies on the use of Supplementary Software zip files for authors to supply the software and code underlying research articles. This isn’t pretty but it fulfills our basic needs. For example, 50% of the research articles in our March issue contain Supplementary Software files. But better methods are needed to archive and document code and assign provenance.

An important initiative in this regard is the “Code as a research object” project that is a collaboration between Mozilla Science Lab, Github and figshare that seeks to “better integrate code and scientific software into the scholarly workflow.” The aim is to create citable endpoints for the exact code used in particular studies. [Full disclosure: figshare is a product of Digital Science which, like Nature Methods, is part of Macmillan Publishers.]

The project is still in its early stages and follows on the similar but broader Research Object community project. Similarly, GigaScience and F1000Research are experimenting with archiving code and pipelines with DOIs.

We applaud these efforts and encourage the broader research community to participate in them. The current discussion about what is needed for code reuse (announced on the ScienceLab blog) and going on in a thread at Github would greatly benefit from more input by researchers who don’t consider themselves code jockeys.

There are many sophisticated and powerful things that could be done in an ideal world to facilitate code exposure and reuse, but the situation at the great majority of journals is so underdeveloped and the needs so acute that even small flexible steps forward will have a positive impact. Most important is for facilities to be put in place that allow and encourage the entire community to move forward, not just a small portion of it.