The Bacterial Proteogenomic Pipeline — ASN Events

The Bacterial Proteogenomic Pipeline (#34)

Julian Uszkoreit 1 , Nicole Plohnke 1 , Sascha Rexroth 1 , Martin Eisenacher 1
  1. Ruhr-Universitaet Bochum, Bochum, Germany

Proteogenomics combines the cutting-edge methods from genomics and proteomics. While it has become cheap to sequence whole genomes, the correct annotation of protein coding regions in the genome is still tedious and error prone. Mass spectrometry on the other hand relies on good characterizations of proteins deriving from the genome, but can also be used to help improving the annotation of genomes or find species specific peptides. Additionally proteomics is widely used to find evidence for differential expression of proteins under different conditions, e.g. growth conditions for bacteria. Though the concept of proteogenomics is not new, mainly in-house scripts or special tools for eukaryotic and human analyses were developed.

The Bacterial Proteogenomic Pipeline, which is completely written in Java, alleviates the conducting of proteogenomic analyses of bacteria. From a given genome sequence, a naïve six frame translation is performed and, if desired, a decoy database generated. This database is used to identify MS/MS spectra by common peptide identification algorithms. After combination of the search results and optional flagging for different experimental conditions, the results can be browsed and further inspected. In particular, for each peptide the number of identifications for each condition and the positions in the corresponding protein sequences are shown. Intermediate and final results can be exported into GFF3 format for visualization in common genome browsers.