Recent advancements in genetics have enabled us to generate gene expression data with spatial information. Identifying genes that display certain spatial expression (SE) patterns can further our understanding of the tissue organization and the spatial transcriptome landscape. For example, in the olfactory bulb (Figure 1), one might be interested in identifying SE genes (e.g. Gene 2 and Gene p in the figure) – these are the genes that are expressed in certain anatomic structures in the olfactory bulb, such as the nerve layer, the mitral cell layer, or the granular cell layer. Identifying these SE genes is critical for understanding cell identity and function in tissue context. However, identifying SE genes is an extremely challenging statistical and computational task. In our experience, the key difficulty is to obtain calibrated p-values for SE tests.
Figure 1: SPARK performs rigorous statistical test to identify SE genes in spatially resolved transcriptomic studies. Bottom: Hematoxylin-and-eosin staining of olfactory bulb, where the expression levels of genes are measured. Middle: Spatial expression pattern of each individual gene. Top: SPARK examines one gene at a time and produces a calibrated p-value testing its SE evidence.
In statistical hypothesis testing, p-value is the probability of obtaining test results as extreme or more extreme as the results observed. However, not all p-values are created equal. When the p-value is calculated correctly, the p-value can be used as a direct measurement of type I error control. For example, a correctly calculated p-value of 0.05 corresponds to a type I error rate of 0.05; that is, if you reject hypothesis at the p-value threshold of 0.05, then your type I error is guaranteed to be at the desired level of 0.05. In contrast, when the p-value is calculated incorrectly, then relying on these incorrectly calculated p-values will lead to excessive false discoveries or lack of statistical power. Calculating correct p-value for many statistical hypothesis testing problems is not a trivial task. SE analysis is not an exception.
In SPARK, we adapted several recently developed statistical innovations to make the calculated p-values calibrated. For example, we used a mixture of chi-square distributions to serve as the exact distribution for the resulting test statistics. We also applied the Cauchy combination rule to combine multiple correlated p-values into a single p-value. These different and complicated procedures, when finally combined together through SPARK, eventually yield calibrated p-values!
The calibrated p-values from SPARK ensure proper control of type I error while providing statistical power than can be an order of magnitude higher than some existing approaches. We are very excited that SPARK brings SE analysis into a rigorous and effective statistical framework. Certainly, SPARK only represents the first step towards comprehensive statistical analysis of spatially resolved transcriptomic studies. With SPARK and its future extensions, we anticipate that important and rigorous biological discoveries will be made in the rapidly evolving world of spatially resolved transcriptomic studies.