Characterization and identification of protein O-GlcNAcylation sites with substrate specificity (#12)
Background:
Protein O-GlcNAcylation, involving the attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is catalyzed by O-GlcNAc transferase (OGT). Elucidation of O-GlcNAcylation sites on proteins is required in order to decipher its crucial roles in regulating cellular processes and aid in drug design. With an increasing number of O-GlcNAcylation sites identified by mass spectrometry (MS)-based proteomics, several methods have been proposed for the computational identification of O-GlcNAcylation sites. However, no development that focuses on the investigation of OGT substrate motifs has existed. Thus, we were motivated to design a new method for the identification of protein O-GlcNAcylation sites with the consideration of substrate site specificity of OGT.
Results:
In this study, 375 experimentally verified O-GlcNAcylation sites were collected from dbOGAP, which is an integrated resource for protein O-GlcNAcylation. Due to the difficulty in characterizing the substrate motifs by conventional sequence logo analysis, a recursively statistical method has been applied to obtain statistically significant conserved motifs. Support Vector Machines (SVMs) were then adopted to construct a two-layered predictive model learned from the identified substrate motifs. The predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 0.76, a specificity of 0.80, and an accuracy of 0.78. Additionally, an independent testing set, which was really blind to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (0.94) and outperform three other O-GlcNAcylation site prediction tools.
Conclusion:
A case study demonstrated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation. We also proposed that the substrate motif may make the study of extensive crosstalk between O-GlcNAcylation and phosphorylation more facile. This method may help unravel their mechanisms and roles in signaling, transcription, chronic disease, and cancer.