Discovery of gene interactions by GPU-enabled computation of pairwise expression level metafeatures (#24)
Background: Across-samples intrinsic variability and noise in microarray gene expression data masks the discovery of the variables of interest associated with the dis-regulation of the biomolecular network in disease. Here we explore the discovery of interactions (differences, ratios, etc.) between levels of gene expression as the independent variables, through their massive computation and characterisation using GPUs. We show the utility of this approach by its application to the analysis of two complex gene expression datasets.
Methods: We developed a GPU-based framework for the high-performance computation and analysis of metafeatures constructed as a function of a pair of features, for all feature pairs in a dataset. We use meta-programming techniques to tailor at run-time the GPU components of the computation for the function being tested. We test and rank metafeatures by using the CM1 score[1,2] and characterise the most significant metafeatures by their combined sample classification power.
Results: Compared with a distributed parallel implementation of the proposed method in R, our framework achieves wall-clock speedups of 25x, and resource savings of 16x to 20x. The GPU tool can quickly analyse large datasets in relatively modest hardware. The metafeatures selected by the method provide an extended classification power compared to gene probesets alone. In a breast cancer dataset with consistent molecular subtype labelling we obtain an average Cramer’s V of 0.91 ± 0.04 for the top ranked differences vs. 0.73 ± 0.06 for the top ranked probes.
Conclusions: Exploration of pairwise combinations of expression levels in the analysis pipeline of disease or its subtypes proves a useful tool for both searching biomarkers and discovery of lesser known interactions affected by confounding factors. Our framework using GPUs makes this kind of analysis available without specialised computing environments.
- 1. John Marsden, David Budden, Hugh Craig, and Pablo Moscato. Language in- dividuation and marker words: Shakespeare and his maxwell’s demon. PLoS ONE, 8(6):e66813, June 2013.
- . Heloisa Milioli, Renato Vimieiro, Carlos Riveros, and Pablo Moscato. Identification of novel biomarkers for the prediction of breast cancer intrinsic subtypes from the METABRIC transcriptomic data set. PLoS ONE, (Submitted), January 2014