Effects of sample size and unbalance on finding cancer biomarker (#253)
Background: Cancer biomarker plays an important role in cancer diagnosis and treatment, but very few robust cancer molecular biomarkers are discovered in the last decades. One of the reasons is that many researchers lack the necessary analysis of the clinical cancer samples to some degree and ignore the effect of sample size and unbalance on finding robust biomarker. Here we study their effects on finding cancer biomarker genes to attract more and more scientist’s attentions.
Methods: We identified a large number of prognostic biomarker gene sets from randomly selected breast cancer data sets with different sample size and ratio using survival risk analysis method and evaluate their stability, robust performance in 8 breast cancer data sets through the proposed evaluation method.
Results: Experimental results show that the number, stability and robustness of biomarker genes have significant change when sample size and ratio change.
Conclusions: Sample size and unbalance have significant effects on finding stable and robust biomarker genes. A large number of cancer samples are necessary to find high quality biomarker genes. The larger sample size of training data sets is, more robust the identified biomarker genes are. In addition, it’s critical to keep an appropriate ratio of different types of cancer samples to overcome negative effects of sample unbalance. The identified biomarker genes are generally more robust when sample ratio is near 1. Data sets from different laboratories also have important effects on finding and test biomarker gene set.