Integrative analysis of multi-omics data to discover novel protein forms (#222)
Rattus norvegicus (Norway Rat) is a model organism for the study of human diseases but annotation of the rat genome lags behind similar efforts in human and mice. Since RNA splicing patterns are reported to be specific to either the rat or mouse, mouse annotations are not likely to be a suitable surrogate for rat proteins and protein evidence specific to the rat is urgently needed. We have developed a novel analysis pipeline, EuGenoSuite, and analysed publicly available RNA-Seq and mass spectrometry datasets to improve the rat genome annotation. Using EuGenoSuite, 276 unique mapping novel peptides were discovered in rat brain microglia tissue. Among these, 145 mapped to the intergenic regions, 28 to annotated non-coding loci, 25 to UTRs of genes, 14 to the intronic regions, 18 to a different translation frame than the annotated CDS and 45 were splice peptides. Intergenic novel peptides were mostly un-annotated parts of genes, peptides from non-coding loci highlighted translation of eight annotated pseudogenes and novel splice-junction peptides identified novel exons and splice variants. Our analysis highlights the major shortcomings in current annotations of the rat genome and improved annotations will provide a better reference for human disease studies. We are now extending this approach to the discovery of novel protein forms specific to aggressive forms of breast cancer.