These techniques measure the spectrochemical composition of samples such as cells, tissues or biofluids, whereby samples can be discriminated and classified into pre-defined categories, including pathological populations. The molecular “fingerprint” generated by these spectral techniques has proven to be a rich source of chemical information for quantitative and qualitative analysis in plant, food, microorganisms and clinical applications.
Figure: Biospectroscopy flowchart.
The use of machine learning algorithms for complex biospectroscopy datasets are being increasingly recognised as critical towards extracting important information and visualising it in a readily interpretable form. Having attended a number of high-profile conferences recently where keynote talks have been delivered, we have been left with the impression that there are major gaps in general knowledge regarding the fundamentals of constructing a requisite experiment and then applying an appropriate algorithm that generates a reliable and robust classification result. This is one of the factors that have held back the implementation of such techniques into general end-user applications, that is, their translation, which is the ultimate goal of such studies.
Our tutorial for multivariate classification for vibrational spectroscopy in biological samples was written with the intention to aid and standardize procedures for biospectroscopy data analysis, thus providing standard guidelines to help young researches in this area. This is a supplement to previous protocols co-authored by our group where experimental guidelines are provided for Raman or IR spectroscopy for cell analysis or biological samples, and dataset standardization. Herein, we summarize and provide guidelines for outlier detection, pre-processing techniques, sample selection, model construction and model validation.
Camilo L.M. Morais is an expert in chemometrics who has just recently been awarded his PhD. Francis L Martin has pioneered the application and use of spectrochemical techniques with multivariate analysis for a range of topics including oncology, neurodegenerative disease, toxicology and environmental sciences.
Camilo L.M. Morais Francis L Martin