Pathway analysis and transcriptomics improve protein identification by shotgun proteomics from samples comprising small number of cells – a benchmarking study (#35)
BACKGROUND:Proteomics research is enabled with the high-throughput technologies, but our ability to identify expressed proteome is limited in small samples. The coverage and consistency of proteome expression are critical problems in proteomics. Here, we propose pathway analysis and combination of microproteomics and transcriptomics analyses to improve mass-spectrometry protein identification from small size samples.
RESULTS: Multiple proteomics runs using MCF-7 cell line detected 4,957 expressed proteins. About 80% of expressed proteins were present in MCF-7 transcripts data; highly expressed transcripts are more likely to have expressed proteins. Approximately 1,000 proteins were detected in each run of the small sample proteomics and more than 4,000 proteins were extracted from the gene sets representing canonical pathways. The identified canonical pathways were largely overlapping between individual runs. Of identified pathways 182 were shared between three individual small sample runs.
CONCLUSIONS: Current technologies enable us to directly detect 10% of expressed proteomes from small sample comprising as few as 50 cells. We used knowledge-based approaches to elucidate the missing proteome that can be verified by targeted proteomics. This knowledge-based approach includes pathway analysis and combination of gene expression and protein expression data for target prioritization. Proteins present in canonical pathways represent approximately 50% of expressed proteomes and 90% of targets from canonical pathways were estimated to be expressed. Highly expressed transcripts indicate high probability of protein expression. However, approximately 10% of expressed proteins are not matched with the expressed transcripts.
RESULTS: Multiple proteomics runs using MCF-7 cell line detected 4,957 expressed proteins. About 80% of expressed proteins were present in MCF-7 transcripts data; highly expressed transcripts are more likely to have expressed proteins. Approximately 1,000 proteins were detected in each run of the small sample proteomics and more than 4,000 proteins were extracted from the gene sets representing canonical pathways. The identified canonical pathways were largely overlapping between individual runs. Of identified pathways 182 were shared between three individual small sample runs.
CONCLUSIONS: Current technologies enable us to directly detect 10% of expressed proteomes from small sample comprising as few as 50 cells. We used knowledge-based approaches to elucidate the missing proteome that can be verified by targeted proteomics. This knowledge-based approach includes pathway analysis and combination of gene expression and protein expression data for target prioritization. Proteins present in canonical pathways represent approximately 50% of expressed proteomes and 90% of targets from canonical pathways were estimated to be expressed. Highly expressed transcripts indicate high probability of protein expression. However, approximately 10% of expressed proteins are not matched with the expressed transcripts.