A Gene Set Method to Predict Patient Survival Risks from Gene Expression Data (#88)
Background: Gene sets representing modules of biological functions, such as pathways and transcriptional regulations, have been actively studied and applied in the analyses of high-throughput gene expression data in clinical studies. While many previous studies have focused on finding gene sets significantly associated with disease conditions, the use of gene sets for the prediction of patient risks from censored survival times hasn’t been well studied.
Results: In this work, we propose a method that utilizes gene sets in the prediction of patient survival risks using gene expression profiles. The method uses the gene set information by summarizing the expression indices of member genes, and incorporates both the single gene and gene set information in the framework of conventional prediction methods. Tested over multiple data sets of cancer and severe injury, the method shows significantly improved prediction power for patient survival risks comparing with conventional single gene predictions, and the performance of prediction seems to benefit from the use of an integrated super-collection of multiple available gene set collections. Detailed examination of the results of prediction in the injury data shows that gene sets selected by the method for the prediction are highly interpretable in biology.
Conclusions: To date, most of outcome predictions using gene expression data have focused on single gene information. The development utilizing gene set information is expected to applicable in a wide range of survival prediction problems in clinical genomics and personalized medicine.