Validation of transcripts assembled from RNA-seq data using proteomics data. (#266)
Alternative splicing of mRNA is known to play a major role in diversifying function of proteins in humans, resulting in cell-specific proteomic variation. While transcriptomic experiments enabled the detection of many alternatively spliced transcripts, the majority of these transcripts seem to lack protein-coding potential. Earlier this year, we have published the Proteomic–Genomic Nexus (PG Nexus) pipeline, which showed that integrating proteomics and transcriptomics data can efficiently validate alternatively spliced transcripts. To accommodate analysis of novel transcripts assembled from RNA-seq data, we have developed an additional software component. The new module, TranscriptCoder, uses previously identified exons to translate RNA transcripts from Cufflinks into a protein sequence database for MS/MS searches. In conjunction with other PG Nexus tools, users can identify and validate splice junction boundaries and mRNA transcripts which have protein-coding potential. For our analyses, we used data from human undifferentiated mesenchymal stem cells (MSC) to validate 4187 unique splice junctions annotated in ENSEMBL. Comparing our method to other approaches, including a 3-frame translation of all RNA-seq transcripts, revealed 4472 unique splice junctions, 4062 of these were the same as those identified by TranscriptCoder. There were 125 junctions that were found by TranscriptCoder only, and 410 junctions found by 3-frame translation only. The above suggests a combination of both methods could enhance the coverage of splice peptides. Our tool and results highlight an integrative approach that is incorporated into our PG Nexus pipeline, allowing us to validate alternatively spliced forms of protein.
- Pang et al. (2014) J. Proteome Res., 13, 84–98.