SPRINT-Gly: predicting N- and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties

Taherzadeh, Ghazaleh, Dehzangi, Abdollah, Golchin, Maryam, Zhou, Yaoqi, and Campbell, Matthew P. (2019) SPRINT-Gly: predicting N- and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties. Bioinformatics, 35 (20). pp. 4140-4146.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://doi.org/10.1093/bioinformatics/b...
 
1


Abstract

Motivation Protein glycosylation is one of the most abundant post-translational modifications that plays an important role in immune responses, intercellular signaling, inflammation and host-pathogen interactions. However, due to the poor ionization efficiency and microheterogeneity of glycopeptides identifying glycosylation sites is a challenging task, and there is a demand for computational methods. Here, we constructed the largest dataset of human and mouse glycosylation sites to train deep learning neural networks and support vector machine classifiers to predict N-/O-linked glycosylation sites, respectively.

Results The method, called SPRINT-Gly, achieved consistent results between ten-fold cross validation and independent test for predicting human and mouse glycosylation sites. For N-glycosylation, a mouse-trained model performs equally well in human glycoproteins and vice versa, however, due to significant differences in O-linked sites separate models were generated. Overall, SPRINT-Gly is 18% and 50% higher in Matthews correlation coefficient than the next best method compared in N-linked and O-linked sites, respectively. This improved performance is due to the inclusion of novel structure and sequence-based features.

Item ID: 76954
Item Type: Article (Research - C1)
ISSN: 1367-4811
Copyright Information: © The Author(s) 2019. Published by Oxford University Press. All rights reserved.
Funders: Australian Research Council (ARC), National Health and Medical Research Council of Australia (NHMRC)
Projects and Grants: ARC DP180102060, NHMRC 1121629
Date Deposited: 07 Dec 2022 00:09
FoR Codes: 31 BIOLOGICAL SCIENCES > 3102 Bioinformatics and computational biology > 310208 Translational and applied bioinformatics @ 100%
SEO Codes: 20 HEALTH > 2099 Other health > 209999 Other health not elsewhere classified @ 100%
Downloads: Total: 1
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page