SPRINT-Gly: predicting N- and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties
Taherzadeh, Ghazaleh, Dehzangi, Abdollah, Golchin, Maryam, Zhou, Yaoqi, and Campbell, Matthew P. (2019) SPRINT-Gly: predicting N- and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties. Bioinformatics, 35 (20). pp. 4140-4146.
PDF (Published Version)
- Published Version
Restricted to Repository staff only |
Abstract
Motivation Protein glycosylation is one of the most abundant post-translational modifications that plays an important role in immune responses, intercellular signaling, inflammation and host-pathogen interactions. However, due to the poor ionization efficiency and microheterogeneity of glycopeptides identifying glycosylation sites is a challenging task, and there is a demand for computational methods. Here, we constructed the largest dataset of human and mouse glycosylation sites to train deep learning neural networks and support vector machine classifiers to predict N-/O-linked glycosylation sites, respectively.
Results The method, called SPRINT-Gly, achieved consistent results between ten-fold cross validation and independent test for predicting human and mouse glycosylation sites. For N-glycosylation, a mouse-trained model performs equally well in human glycoproteins and vice versa, however, due to significant differences in O-linked sites separate models were generated. Overall, SPRINT-Gly is 18% and 50% higher in Matthews correlation coefficient than the next best method compared in N-linked and O-linked sites, respectively. This improved performance is due to the inclusion of novel structure and sequence-based features.
Item ID: | 76954 |
---|---|
Item Type: | Article (Research - C1) |
ISSN: | 1367-4811 |
Copyright Information: | © The Author(s) 2019. Published by Oxford University Press. All rights reserved. |
Funders: | Australian Research Council (ARC), National Health and Medical Research Council of Australia (NHMRC) |
Projects and Grants: | ARC DP180102060, NHMRC 1121629 |
Date Deposited: | 07 Dec 2022 00:09 |
FoR Codes: | 31 BIOLOGICAL SCIENCES > 3102 Bioinformatics and computational biology > 310208 Translational and applied bioinformatics @ 100% |
SEO Codes: | 20 HEALTH > 2099 Other health > 209999 Other health not elsewhere classified @ 100% |
Downloads: |
Total: 1 |
More Statistics |