An increasingly common genomics problem: which variants within a gene are potentially pathogenic? One main tool to discern the pathogenicity of each variant is a genetic perturbation screen—where variants are introduced into a cell line and screened for a cellular phenotype. To do this, you need to be able to somehow identify the phenotype you’re interested in. In some screens, it might be something easily selectable, like chemotherapy drug resistance. It could be something you can tag with a fluorescent marker, which you can then identify with flow cytometry. But many physiologically interesting phenotypes are more complex and can only be discerned by looking under a microscope. We decided to design a way to screen for these phenotypes, and we created Raft-seq, a pooled genetic perturbation screening method for capturing variants that produce a phenotype measurable by imaging.
Being able to screen for phenotypes through imaging is not trivial, and there are two main obstacles: First, how can you take imaging data from a bunch of cells and extract phenotype information? Second, how can you genotype each cell in a way that you can relate the perturbation introduced to that cell to the cell’s microscopy data?
To solve the first question, we use machine learning. After the cells are perturbed and stained, we image them on a high-throughput confocal fluorescent microscope. We then use a software which extracts features for each from the images, such as mitochondria count or nuclear fluorescence intensity. These features are what we feed into our machine learning models. In our experiments, we also generate feature data for cells that we know have normal or mutant phenotypes. We pick models that can distinguish between those cells and apply the model to cells where we don’t know their phenotype ahead of time, generating a prediction for their mutant status.
The second question is slightly thornier to answer. The identifying information for each cell in microscopy data is its position on the tissue culture plate, which is totally lost if you detach the cells and say, put them through flow cytometry. In the past few years, some methods have been developed to solve this problem. One method excites an endogenous fluorophore in specific cells with a digital micromirror device, and these cells can then be sorted by flow cytometry and genotyped in bulk1. Another eliminates the need for cell isolation altogether by performing in situ sequencing, i.e., sequencing a perturbation barcode in the cells while they are still on the tissue culture plate2.
Our solution to this is using a specialized microraft plate, AKA the "raft" in Raft-seq. These plates, from Cell Microsystems, are made up of a grid of rafts, each 100 microns in length and width. These rafts can be individually and automatically released then magnetically transferred to a 96-well plate. Once in the 96-well plate, we genotype the cells to discover the perturbation in each one. By imaging on the microraft plate and keeping track of where on the plate each cell we isolate came from, we associate the perturbation (e.g. CRISPR gRNA) with the microscopy data and the model prediction. The overall workflow of Raft-Seq is shown below (Fig 1.)
To test Raft-seq, we used mutants in the gene MFN2—known to assist in mitochondrial dynamics. On a cellular level, some mutations in MFN2 cause mitochondria to clump together abnormally. Pathogenic mutations in MFN2 cause Charcot-Marie-Tooth disease, a peripheral neuropathy.
As a proof of concept, we made a cocktail of known pathogenic mutants that caused a range of mitochondrial aberrations from mild to extreme. We then used Raft-seq to "unmix" the pathogenic cells from wild-type cells. For this experiment, we used machine learning models that were "supervised", meaning we fed them data from standalone populations of the pathogenic or wild type cells to "learn" their differences. These models were then applied to the pathogenic/wild-type mixture to predict whether cells had either the wild type or pathogenic phenotype. We measured our accuracy using the area under the curve (AUC) for the tradeoff of sensitivity to specificity and got 0.74 in this setting. Interestingly, we found that our models had a higher sensitivity on genotypes with more extreme phenotypes, which correspond to mutants with more severe disease. This validated our approach to better understand the MFN2 phenotype and to potentially identify unknown variants.
Next, we generated a large library of MFN2 variants using a CRISPR/Cas9 library. This allowed us to generate a large set of mutants with unknown phenotypes. For this approach, we used a set of anomaly detection models, which are "semi-supervised". These models don't require a set of known pathogenic mutants, instead using a population of normal cells and selecting cells which deviate (similar to detecting credit card fraud). We were able to select cells identified as anomalous across many of these models. We isolated a subset of perturbed cells, and first genotyped the CRISPR gRNA present in each cell while also caching a live cell to create a cell line. We then identified the genetic consequence caused by the Cas9 edit. The cells with confirmed Cas9 cuts had strong phenotypes compared to wild types, scored using a model built with mitochondrial features. For one of these cell lines, we evaluated metabolic changes and confirmed that it was like a known pathogenic mutant.
The main limitation of Raft-seq compared to other methods is scale, as isolating the individual microrafts takes time (currently a few thousand per day is possible). However, Raft-seq has a few advantages over them: Raft-seq is the only platform that can isolate individual live cells, making follow-up experiments on screens a less intensive process; our isolation process also means that we can do single-cell genotyping using standard next-generation sequencing pipelines. We believe that Raft-seq will be an invaluable method for functional variant assessment for complex phenotypes in more and more interesting cell lines.
- Yan X, Stuurman N, Ribeiro SA, Tanenbaum ME, Horlbeck MA, Liem CR, Jost M, Weissman JS, Vale RD.High-content imaging-based pooled CRISPR screens in mammalian cells. J Cell Biol 1 February 2021; 220 (2): e202008158. doi: https://doi.org/10.1083/jcb.202008158
- Feldman D, Funk L, Le A, Carlson RJ, Leiken MD, Tsai F, Soong B, Singh A, Blainey PC. Pooled genetic perturbation screens with image-based phenotypes. Nat Protoc. 2022 Feb;17(2):476-512. doi: https://doi.org/10.1038/s41596-021-00653-8