Combining protein ratio p-values yields a useful pragmatic approach to the analysis of multi-run iTRAQ experiments

Combining protein ratio p-values yields a useful pragmatic approach to the analysis of multi-run iTRAQ experiments

Combining protein ratio p-values yields a useful pragmatic approach to the analysis of multi-run iTRAQ experiments (#15)

Dana Pascovici ¹ , Xiaomin Song , Edmond Breen , Jemma Wu , Mark Molloy

Australian Proteome Analysis Facility, Macquarie University, NSW, Austria

The promise of iTRAQ has been tempered by some well documented difficulties including ratio compression and run variability which have made the analysis of iTRAQ experiments challenging, particularly in the multi-run scenario. From the statistical standpoint, a main difficulty in working with protein ratios in the presence of variability across experiments is that the protein ratios are not “all born equal”, but have varying credibility depending on the number and quality of the peptides that were used to generate them. Whilst one can easily measure the credibility of a protein ratio using confidence intervals or p-values, it is hard to integrate such measures directly alongside the ratios in a standard statistical analysis such as ANOVA. Hence more sophisticated methods of statistical analysis such as those introduced by Hill and Oberg [1,2] revert to the peptide ratios rather than working directly with the protein ratios, but yield complex ANOVA models whose solution relies on computational approaches such as stage-wise regression which are non-trivial to run and harder to verify.

A more pragmatic approach can be taken to generate combined measures of ratio confidence across experiments, in a fashion similar to running a meta-analysis across different iTRAQ runs. We present and evaluate such an analysis method, which relies on combining p-values for the iTRAQ ratios using a measure such as Stouffer’s Z-transform test [3] alongside a run consistency measure. The core advantages are simplicity, high tolerance of run variability, and emphasis on proteins with high identification confidence. We show some limitations on the types of experiment designs that can be tackled using this approach, and also explore the applicability of multiple testing correction procedures to the context of iTRAQ protein level data. The main example iTRAQ dataset used belongs to a large multi-run pathogen exposure time course in wheat leaves.

Hill, E. G.; Schwacke, J. H.; Comte-Walters, S.; Slate, E. H.; Oberg, A. L.; Eckel-Passow, J. E.; Therneau, T. M.; Schey, K. L., A statistical model for iTRAQ data analysis. Journal of Proteome Research 2008, 7 (8), 3091-3101.
Oberg, A. L.; Mahoney, D. W.; Eckel-Passow, J. E.; Malone, C. J.; Wolfinger, R. D.; Hill, E. G.; Cooper, L. T.; Onuma, O. K.; Spiro, C.; Therneau, T. M., Statistical analysis of relative labeled mass spectrometry data from complex samples using ANOVA. Journal of Proteome Research 2008, 7 (1), 225-233.
Whitlock, M. C., Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach. Journal of Evolutionary Biology 2005, 18 (5), 1368-1373.