Critical Assessment of Metagenome Interpretation – the second round of benchmarking challenges provides a comprehensive overview of metagenomic software performance

Like Comment
Read the Article

The Critical Assessment of Metagenome Interpretation (CAMI) initiative brings together the metagenomics research community to facilitate the benchmarking of metagenomic computational methods, promote standards and good practices, and further accelerate advancements in this rapidly evolving field of bioinformatics. In its latest work, CAMI reports the results of its second round of challenges (CAMI II), benchmarking state-of-the-art methods using metrics and procedures agreed upon by the community, identifying issues, and guiding researchers in the selection of the best methods for their analyses. CAMI takes this opportunity to sincerely thank the more than 100 collaborators for this huge effort.


Our planet is inhabited by an enormous number of different microbial organisms that live together in communities, which range from very simple to highly complex in terms of the number of different taxa, on every accessible surface. Metagenomics, the direct sequencing of genetic material isolated from such communities, allows us to also study community members that are difficult to obtain in pure culture, which is true for most microbial phyla (1). This has created new opportunities to study microbial communities and explore environments and applications, and has provided enormous advances in the field over the last decade. Along with experimental developments, a need for techniques to process and interpret these large amounts of data arose — which are frequently mixtures of short pieces of DNA sequences (reads) from completely different microorganisms, containing the typical errors of the sequencing technology. The corresponding “toolscape” in the area of computational metagenomics includes a wide variety of methods and is evolving rapidly in parallel with sequencing technologies that keep producing larger and larger amounts of data and longer reads with fewer errors at lower cost. For researchers to be able to choose the most suitable methods for their research, a comprehensive and continuous benchmarking of computational metagenomics methods is therefore needed to provide an up-to-date overview of the performance of these techniques in different applications.

Such comprehensive benchmarking was missing in the field, and the lack of established performance metrics, common data formats among the different methods, and benchmarking procedures and datasets also made it difficult. This was an issue both for method users and developers, who needed to implement their own performance analysis frameworks, with bias-prone results.

Foundation of the CAMI initiative and the first round of challenges

Aware of these difficulties, Alice McHardy, Alex Sczyrba, and Thomas Rattei founded the community-led initiative known as the Critical Assessment of Metagenome Interpretation (CAMI) in 2014 (2) for comprehensive and unbiased evaluation of methods and promotion of the best practices in the field. In CAMI, benchmarking design decisions are made upon agreement with the research community, for instance, in discussions in workshops and conferences (e.g., at the Isaac Newton Institute in Cambridge in 2014, at the University of Maryland in 2017, in Braunschweig in 2020, and regularly at ISMB conferences; see for upcoming events).

To further motivate the participation of the community, CAMI launched its first benchmarking challenge in 2015. Participants were invited to download novel metagenomic datasets derived from newly sequenced data of varying complexities in terms of the number of samples and microorganisms represented, and abundances ( They could then submit the results of applying their methods to these data by performing one of the following major tasks in a metagenomics pipeline: metagenome assembly (the task of piecing together the sequence reads into longer sequences of individual genomes), genome binning (grouping the sequences from the same genome to recover the genomes), taxonomic binning (grouping sequences labeled with a taxonomic identifier), and taxonomic profiling (quantifying the presence and relative taxon abundances of microbial communities). The challenge concluded successfully with widespread participation, with over 200 result submissions, which have been assessed and compiled in (3).

The second round of challenges: CAMI II

Accompanying advancements in sequencing technologies and experimental setups, CAMI launched the second round of challenges with several innovations in 2019. The datasets on offer represented commonly studied environments, namely a marine and an environment with high strain diversity, derived from both published and newly sequenced data. In 2020, a plant-associated dataset, including fungal and host plant genome sequences, was released. The data mimicked sequencing technologies that have been gaining popularity, namely Pacific Biosciences and Oxford Nanopore, besides Illumina, which was used in the first challenge. A new clinical pathogen detection challenge based on patient data was offered to raise awareness of the potential use of metagenomics in a clinical setting. As computational efficiency can be decisive when choosing a method, runtime and memory usage were also measured. Efficient methods and improvements in other metrics compared with the first challenge were observed for all task categories (4). Together with the large increase in participation, with over 5000 submissions, the advancements in and importance of the field were confirmed.

Legacy, future, and how to contribute

Besides the benchmarking results, various resources remain available for the research community. These include the datasets mentioned above, as well as others (5–8), and the CAMISIM software (9) for metagenomic data simulation. The metrics used in CAMI have also been implemented in freely available software: MetaQUAST (10) for metagenome assembly evaluation, AMBER (11) for evaluations of genome and taxonomic binning, and OPAL (12) for evaluations of taxonomic profiling. Best benchmarking practices have been described in a tutorial paper (13). Moreover, a repository of results is available in the CAMI Zenodo Community (

CAMI remains open for suggestions, new software results, and data contributions. For details and further resources, we refer the reader to CAMI’s latest paper or our website, As metagenomics continues to advance, it will be important to benchmark methods in the future as well.

References and further resources

  1. Hug, L. A. et al. A new view of the tree of life. Nat Microbiol 1, 16048 (2016).
  2. The Critical Assessment of Metagenome Interpretation (CAMI) competition. (2014). Available at <>
  3. Sczyrba, A. et al. Critical Assessment of Metagenome Interpretation – a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).
  4. Meyer, F., Fritz, A. et al. Critical Assessment of Metagenome Interpretation – the second round of challenges. Nat. Methods (2022). doi:10.1038/s41592-022-01431-4
  5. Sczyrba, A. et al. Benchmark data sets, software results and reference data for the first CAMI challenge. doi:10.5524/100344
  6. Fritz, A. et al. CAMI 2 - Challenge Datasets. (2021). doi:10.4126/FRL01-006425521
  7. Fritz, A., McHardy, A., Lesker, T. & Bremges, A. CAMI 2 – Multisample benchmark dataset of human microbiome project. (2021). doi:10.4126/FRL01-006425518
  8. Fritz, A., Lesker, T., Bremges, A. & McHardy, A. C. CAMI 2 – Multisample benchmark dataset of mouse gut. (2020). doi:10.4126/FRL01-006421672
  9. Fritz, A. et al. CAMISIM: simulating metagenomes and microbial communities. Microbiome 7, 17 (2019).
  10. Mikheenko, A., Saveliev, V. & Gurevich, A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32, 1088–1090 (2016).
  11. Meyer, F. et al. AMBER: Assessment of Metagenome BinnERs. Gigascience 7, (2018).
  12. Meyer, F. et al. Assessing taxonomic metagenome profilers with OPAL. Genome Biol. 20, 51 (2019).
  13. Meyer, F. et al. Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit. Nat. Protoc. 16, 1785–1801 (2021).

Fernando Meyer

Postdoc, Helmholtz Centre for Infection Research

1 Contributions
2 Following