Network-based biomarkers enhance classical approaches to prognostic gene expression signatures (#37)
Classical approaches to predicting patient clinical outcome via gene expression information are based on differential expression of unrelated genes (single-gene approaches) or genes related by biologic features (gene-sets). Recently, network-based approaches utilising interaction information between genes have emerged. An open problem is whether such approaches add value to traditional methods of modelling. We explore this question via comparison of single-gene, gene-set, and network-based methods, using gene expression microarray data from two cancers. We consider two general network approaches. The first of these identifies informative genes using gene expression and network information drawn from prior knowledge of protein-protein interactions (PPIs). In the second approach, classification features are small networks of interacting proteins (again, identified from prior knowledge) or are obtained from such networks e.g., by considering edges (interactions) or hubs (highly-connected proteins). For all methods we perform 100 rounds of 5-fold cross-validation under three different classifiers. For network-based approaches, we consider two PPI networks. We quantify resulting patterns of misclassification and discuss the relative value of each with respect to ongoing development of prognostic biomarkers. We find that single-gene, gene-set and network methods yield similar classification error rates across cancer data sets. Crucially, however, our detailed patient-level analyses reveal that the different methods are correctly classifying alternate subsets of patients within each cohort. We also find that the network-based NetRank feature selection method is the most stable. Network-based methods of signature modelling harness data from external sources and are foreshadowed as a standard mode of analysis. But do they add to traditional approaches? Our findings indicate there is value in the way different subspaces of the patient sample are captured differently among the various methods, highlighting the possibility of ‘combination’ classifiers capable of identifying which patients will be more accurately classified by one particular type of method over another.