Predicting Protein-Binding Nucleotides with Consideration of a Binding Partner of RNA (#216)
Background
In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. In contrast to the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received much less attention.
Results
In this study we identified effective features of RNA and protein molecules and developed a support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The model that used both protein and RNA sequence data achieved an accuracy of 86.3% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved an accuracy of 79.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another SVM model that uses RNA sequence alone. The model that used RNA sequence data alone achieved an accuracy of 82.2% and MCC of 0.63 in a 10 fold-cross validation; it achieved an accuracy of 75.5% and MCC of 0.45 in independent testing.
Conclusions
Both in cross-validations and independent testing, the model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone. Unlike previous computational approaches that predict the RNA- or DNA-binding residues in a protein sequence without considering the binding partners of the target protein, our prediction model predicts different binding sites for a given RNA sequence when its binding partner is changed. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides, which considers the binding partner of RNA.