Computable sugars: some computational resources in glycoscience

Jun 29, 2017
0
0
Glycoscience is sweet science

Glycoscience is sweet science  (PhotoDisc/Getty Images)


As glycoscience advances, labs will increasingly want to ask questions about glycosylation sites on a protein or the structure of a sugar, says Raja Mazumder, a bioinformatician at George Washington University. They might ask for example: are there glycosyltransferases that are expressed in liver but not in the heart, or, which ones are overexpressed by a factor of three in more than two cancers. Such questions require infrastructure building, he says, because right now there is no mechanism to allow such queries. But he and others are building such capabilities. Mazumder along with William York at the University of Georgia are starting to build a glycoscience informatics portal.

Mazumder wants to leverage existing ontologies in the developer community in order to build systems that can be queried on a large-scale. For example, Mazumder is working with Cathy Wu at Georgetown University, who is developing the Protein Ontology. Such ontologies are collected, for example, by the non-profit OBO Foundry. To allow flexible querying, the computational resources will draw on different ontologies; ones that relate to glycans, genes, proteins, tissues, diseases and more.

Ontologies are part the team’s effort to build application program interfaces (APIs) that expose the data in a given database to incoming queries. Given how complex sugars are, the informatics framework has to be well-organized for both human and machine-based querying, says Mazumder.

When using the resource, a researcher will receive results that also document the search process itself such as the version of the queried database. “You need to be able to tell where you got that information from,” says Mazumder. Tracking data provenance matters especially in an age when databases continuously integrate information emerging in the literature.

For the Food and Drug Administration, Mazumder is developing computational standards for high-throughput sequencing, which he wants to also apply to glycoscience. His ‘biocompute object’ captures the given computational workflow a lab might have used to generate results: the software used, the databases queried and their version, and identifiers of data inputs and outputs. These biocompute objects are intended to help regulatory scientists interpret submitted work. It can also help scientists generally see if, for example, the version of software they used worked as it should, says Mazumder.

Too often labs use computational tools without benchmarking them, says Mazumder. “It would be unthinkable for a wet-lab scientist to not have a positive and negative control,” he says.  In informatics, developers benchmark their software but users often do not have these habits. “They don’t even know: if I don’t find anything, is it because my software did not run well or not?”

As labs move to big data analysis in genomics and also, eventually, in glycoscience, this aspect is ever more important, says Mazumder. In his view, biocompute objects will help glycobiology researchers communicate with one another about their results, such as where on a protein they found a sugar with a given structure. More generally, it will help glycoscientists to have a better way to connect the available sugar resources as they pursue their questions of interest.


Here are some resources that glycoscientists can tap into:                             

 Category Resource Description
General resources and funding information
Transforming Glycoscience: A Roadmap for the Future Report by the National Research Council of the National Academies of Science
NIH Common Fund program in glycoscience  Funding opportunities from the NIH Common Fund program in glycoscience
A roadmap for Glycoscience In Europe by BBSRC, EGSF, European Science Foundation   Glycoscience roadmap for Europe
GlycoNet Resources related to glycoscience research in Canada, based at the University of Alberta where the Alberta Glycomics Centre is located
National Center for Functional Glycomics A Glycomics-related Biomedical Technology Resource Center based at Beth Israel Deaconess Medical Center, Harvard Medical School with resources on, for example, microarrays and microarray services, protocols, training and databases
Databases and  portals 
CAZy Carbohydrate-Active Enzymes, a database of enzyme families that degrade, modify or create glycosidic bonds
Consortium for Functional Glycomics Resources and glycoscience data. Part of the National Center for Functional Glycomics.
ExPASy Software tools and databases to simulate, predict and visualize glycans, glycoproteins and glycan-binding proteins
Glycan Library  A list of lipid-linked sequence-defined glycan probes
Glyco3D A portal for structural glycoscience
GlycoBase 3.2 A database of N– and O-linked glycan structures with HPLC, UPLC, exoglycosidase sequencing and mass spectrometry data
GlycoPattern Portal for glycan array experimental results from the Consortium for Functional Glycomics
Glycosciences.de Collection of databases and tools in glycoscience
GlyToucan Repository for glycan structures based in Japan
MatrixDB A database of experimental data of interactions by proteoglycans, polysaccharides and extracellular matrix proteins
Repository of Glyco-enzyme expression constructs University of Georgia Complex Carbohydrate Research Center repository for glyco-enzyme constructs
SugarBind A database of carbohydrate sequences to which bacteria, toxins and viruses adhere
UniCarbKB A resource curated by scientists in in five countries. It includes GlycoSuiteDB, a database of glycan structures; EUROCarbDB, an experimental and structural database and UniCarb-DB, a mass spec database of glycan structures
Software tools
CASPER Web-based tool to calculate NMR chemical shifts of oligo- and polysaccharides
Glycan Builder An online tool at ExPASy for predicting possible oligosaccharide structures on proteins
GlycoMiner/GlycoPattern Software tools to automatically identify mass spec spectra of N-glycopeptides
GlyMAP An online resource for mapping glyco-active enzymes
NetOGlyc Software tool for predicting O--glycosylation sites on proteins
SweetUnityMol Molecular visualization software

Sources: NIH, R. Mazumder, George Washington University; New England Biolabs, Thermo Fisher Scientific, Nature Research

Vivien Marx

Journalist , Nature Research

23 Contributions
0 Followers
0 Following

No comments yet.