As glycoscience advances, labs will increasingly want to ask questions about glycosylation sites on a protein or the structure of a sugar, says Raja Mazumder, a bioinformatician at George Washington University. They might ask for example: are there glycosyltransferases that are expressed in liver but not in the heart, or, which ones are overexpressed by a factor of three in more than two cancers. Such questions require infrastructure building, he says, because right now there is no mechanism to allow such queries. But he and others are building such capabilities. Mazumder along with William York at the University of Georgia are starting to build a glycoscience informatics portal.
Mazumder wants to leverage existing ontologies in the developer community in order to build systems that can be queried on a large-scale. For example, Mazumder is working with Cathy Wu at Georgetown University, who is developing the Protein Ontology. Such ontologies are collected, for example, by the non-profit OBO Foundry. To allow flexible querying, the computational resources will draw on different ontologies; ones that relate to glycans, genes, proteins, tissues, diseases and more.
Ontologies are part the team’s effort to build application program interfaces (APIs) that expose the data in a given database to incoming queries. Given how complex sugars are, the informatics framework has to be well-organized for both human and machine-based querying, says Mazumder.
When using the resource, a researcher will receive results that also document the search process itself such as the version of the queried database. “You need to be able to tell where you got that information from,” says Mazumder. Tracking data provenance matters especially in an age when databases continuously integrate information emerging in the literature.
For the Food and Drug Administration, Mazumder is developing computational standards for high-throughput sequencing, which he wants to also apply to glycoscience. His ‘biocompute object’ captures the given computational workflow a lab might have used to generate results: the software used, the databases queried and their version, and identifiers of data inputs and outputs. These biocompute objects are intended to help regulatory scientists interpret submitted work. It can also help scientists generally see if, for example, the version of software they used worked as it should, says Mazumder.
Too often labs use computational tools without benchmarking them, says Mazumder. “It would be unthinkable for a wet-lab scientist to not have a positive and negative control,” he says. In informatics, developers benchmark their software but users often do not have these habits. “They don’t even know: if I don’t find anything, is it because my software did not run well or not?”
As labs move to big data analysis in genomics and also, eventually, in glycoscience, this aspect is ever more important, says Mazumder. In his view, biocompute objects will help glycobiology researchers communicate with one another about their results, such as where on a protein they found a sugar with a given structure. More generally, it will help glycoscientists to have a better way to connect the available sugar resources as they pursue their questions of interest.
Here are some resources that glycoscientists can tap into:
|General resources and funding information|
|Transforming Glycoscience: A Roadmap for the Future||Report by the National Research Council of the National Academies of Science|
|NIH Common Fund program in glycoscience||Funding opportunities from the NIH Common Fund program in glycoscience|
|A roadmap for Glycoscience In Europe by BBSRC, EGSF, European Science Foundation||Glycoscience roadmap for Europe|
|GlycoNet||Resources related to glycoscience research in Canada, based at the University of Alberta where the Alberta Glycomics Centre is located|
|National Center for Functional Glycomics||A Glycomics-related Biomedical Technology Resource Center based at Beth Israel Deaconess Medical Center, Harvard Medical School with resources on, for example, microarrays and microarray services, protocols, training and databases|
|Databases and portals|
|CAZy||Carbohydrate-Active Enzymes, a database of enzyme families that degrade, modify or create glycosidic bonds|
|Consortium for Functional Glycomics||Resources and glycoscience data. Part of the National Center for Functional Glycomics.|
|ExPASy||Software tools and databases to simulate, predict and visualize glycans, glycoproteins and glycan-binding proteins|
|Glycan Library||A list of lipid-linked sequence-defined glycan probes|
|Glyco3D||A portal for structural glycoscience|
|GlycoBase 3.2||A database of N– and O-linked glycan structures with HPLC, UPLC, exoglycosidase sequencing and mass spectrometry data|
|GlycoPattern||Portal for glycan array experimental results from the Consortium for Functional Glycomics|
|Glycosciences.de||Collection of databases and tools in glycoscience|
|GlyToucan||Repository for glycan structures based in Japan|
|MatrixDB||A database of experimental data of interactions by proteoglycans, polysaccharides and extracellular matrix proteins|
|Repository of Glyco-enzyme expression constructs||University of Georgia Complex Carbohydrate Research Center repository for glyco-enzyme constructs|
|SugarBind||A database of carbohydrate sequences to which bacteria, toxins and viruses adhere|
|UniCarbKB||A resource curated by scientists in in five countries. It includes GlycoSuiteDB, a database of glycan structures; EUROCarbDB, an experimental and structural database and UniCarb-DB, a mass spec database of glycan structures|
|CASPER||Web-based tool to calculate NMR chemical shifts of oligo- and polysaccharides|
|Glycan Builder||An online tool at ExPASy for predicting possible oligosaccharide structures on proteins|
|GlycoMiner/GlycoPattern||Software tools to automatically identify mass spec spectra of N-glycopeptides|
|GlyMAP||An online resource for mapping glyco-active enzymes|
|NetOGlyc||Software tool for predicting O--glycosylation sites on proteins|
|SweetUnityMol||Molecular visualization software|
Sources: NIH, R. Mazumder, George Washington University; New England Biolabs, Thermo Fisher Scientific, Nature Research