Jun 27, 2014

Community bioinformatics challenges help drive methods development.

Science moves ahead faster as a social enterprise, perhaps especially so in the dynamic area of bioinformatics. Bioinformatics competitions are important opportunities for developers (and users) to come together to define the essential questions in the field and decide on the best metrics to evaluate them. They also perform a critical function in making valuable benchmark datasets available to anyone, including small labs and young students.

At the end of a challenge, ideally, is a better appreciation of the most promising approaches to a problem, as well as a recognition of difficulties and opportunities for future development. And  there are new contacts for collaboration. As a reality check, it is less common for researchers with directly competing methods to collaborate; their work depends on a competitive funding model. But complementary approaches provide fertile ground for exploring new ways to attack a problem, and some contests are directly encouraging collaborative coding.

In our July Editorial, we continue our support of these initiatives, urging participation and an embrace of formats that maximize engagement among participants. Already, measures like on-line forums, webinars, and conferences involve participants in the planning and interpretation stages, which are critical for getting the most out of each event.

A variety of formats beyond the traditional bake-off are evolving in the collaborative spirit, encouraging more sharing of ideas and code. For example, hackathons take on more focused coding challenges in a single dedicated meet-up session, while open-source competitions make code available during the contest to allow researchers to learn from each other. These formats are not meant as an evaluation of existing methods, but promote new solutions. As Gustavo Stolovitzky of the DREAM challenges points out, publishing code during the event has the potential for ‘herding’ behavior (copy-the-leader), which can stifle creativity and produce a coding monoculture. A number of DREAM challenges now use a two-stage approach in which top performers from a traditional competition phase are invited back to develop a new and better solution together.

Journals and funders also play a role in supporting these efforts. Nature methods has published a number of papers resulting from community competitions (CAFA, DREAM, FlowCAP, Particle tracking and RGASP) and the Nature journals have been committed to providing these papers under a Creative Commons attribution-noncommercial-share alike unported license since January 2013.

There are difficulties associated with running large-scale events. Choice of data set and metrics can bias evaluations towards certain solutions, and the involvement of many developers can water down the conclusions resulting from the challenge. Moreover, usability is often not considered since it is hard to quantify. Ultimately, these issues can be helped by boosting participation in decision-making during planning stages, tailoring conclusions to each scenario that is tested, and having judging panels test the best-performing methods to ensure usability.

We are heartened to see the continued success of community-led competitions and the birth of contests in new areas. In a guest post, we invited organizers of the CAMI competition to announce their upcoming event on metagenome data interpretation.

Below, we provide a non-comprehensive list of some recent and ongoing challenges:

Assemblathon – genome assembly
CAFA (Critical Assessment of Function Annotation) – protein functional prediction
CAGI (Critical Assessment of Genome Interpretation) – functional variant prediction
CAMI (Critical Assessment of Metagenome Interpretation) – see the announcement
CAPRI (Critical Assessment of PRediction of Interactions) – structure-based protein-protein interaction prediction
CASP (Critical Assessment of protein Structure Prediction) – protein structure prediction since 1994!
DREAM (Dialogue for Reverse Engineering Assessment and Methods) – systems biology challenges with hybrid formats and challenge-assisted review
FlowCAP (Flow Cytometry: Critical Assessment of Population Identification Methods)
Grand Challenges in biomedical image analysis
Particle tracking challenge
RGASP (RNA-seq Genome Annotation Assessment Project)

Crowdsourcing competitions, hackathons and fast challenges
BioHackathons – open-source programming meetups
Innocentive – commercial platform offering cash prizes (e.g. the $1 million US Defense Threat Reduction Agency (DTRA) challenge to identify organisms from a stream of DNA sequences)
DNA60IFX – short challenges based on DNA or RNA sequence data
DREAM – a number of recent and current challenges include a collaborative phase of tool development
Neurosynth hackathons – open-source programming meetups in computational neurobiology
Sequence Squeeze – open-source competition for sequence file compression (cash prize)
[topcoder] – variety of computational challenges, with some cash prizes

