Identifying significant associations with interacting germline variation and somatic mutational events for cancers — ASN Events

Identifying significant associations with interacting germline variation and somatic mutational events for cancers (#75)

Zhongmeng Zhao 1 , Xuanping Zhang 1 , Wenke Wang 1 , Yu Geng 1 , Mingchao Xie 2 , Beifang Niu 2 , Kai Ye 2 , Kimberly Johnson 3 , Li Ding 2 , Xiao Xiao 4 , Jiayin Wang 2
  1. Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, P.R.China
  2. The Genome Institute, Washington University in St. Louis, St. Louis, MO, United States
  3. Brown School Master of Public Health Program, Washington University in St. Louis, St. Louis, MO, United States
  4. State Key Laboratory of Cancer Biology, Xijing Hospital of Digestive Diseases, Xi'an, P.R.China

Background: Identifying novel deleterious germline variation and somatic events is one of the essential questions in cancer genomics. A series of association approaches have been proposed to achieve this, among which the burden-test-based methods are the most popular ones. However, these methods are challenged by multiple issues, such as overly depending on pre-selection genetic models, hard to differentiate deleterious variants from neutral ones, suffering low statistical power, etc. Moreover, interactions among germline and somatic variation have been widely reported recently, but without being considered in burden-tests. 
Results: Motivated by the issues aforementioned, we propose a novel association approach to identify deleterious variants using combined germline variants and somatic mutational events from cancer genome sequencing data. As a model-free strategy, our approach RareProb-C makes algorithmic selections of causal variants and eliminates singular cases, and then collapses the candidate causal mutations into a statistical test. In addition, an improved four-Gamete test is introduced to enhance the accuracy and reduce false positives. We compare RareProb-C to existing burden-test approaches on both artificial and real datasets. RareProb-C achieves higher statistical power than those existing ones under different simulation configurations. We perform RareProb-C on an ATM gene screening dataset and an ovarian cancer research dataset that consists of 419 cases with tumor-normal pair Exome-Seq data, where our approach successfully identifies most of the highlighted variants which are considered enriching disease susceptibilities.