Preventing overfitting during the reconstruction of macromolecule images from CryoEM data

Using cryo-electron microscopy (cryoEM) it is possible to get information about the three-dimensional structure of macromolecules. Samples can be prepared using, for example, a protocol by Grassucci et al., and EM images obtained.

What is important to note, is that the re-constructed 3D image is generated using  many thousands of  macromolecules (particles). In the figure below, for example, a total of 29,926 particles from 616 CCD frames (4k × 4k frames from a charge-coupled device camera) were used to produce the 4.3 Å resolution reconstruction shown in panel a. (taken from Zhang et al., 2010)

Image reconstruction is therefore a significant part of the process of generating meaningful data, and it is important that the averaging process doesn’t overfit the data (my understanding of this is that overfitting relates to finding patterns in the parts of the data that are really just noise; and that this over-interpretation of the data would lead to an over-estimate of the true resolution of the image).

One of our protocols for image analysis involves the use of the software XMIPP. In a recent letter to Nature Methods, Scheres and Chen introduce a script that can be used on top of the conventional projection-matching protocol in the XMIPP package to help prevent data overfitting.

The article can be accessed here:

Prevention of overfitting in cryo-EM structure determination

The script is included in the Supplementary Information (should be accessible without a site licence):


