Molecular profiling of thyroid cancer subtypes using large-scale text mining — ASN Events

Molecular profiling of thyroid cancer subtypes using large-scale text mining (#72)

Chengkun Wu 1 2 3 , Jean-Marc Schwartz 1 , Georg Brabant 4 5 , Goran Nenadic 3 6 7
  1. Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
  2. Doctoral Training Centre in Integrative Systems Biology, University of Manchester, Manchester, United Kingdom
  3. Manchester Institute of Biotechnology, University of Manchester, Manchester, United Kingdom
  4. Department of Endocrinology, Christie Hospital, University of Manchester, Manchester, United Kingdom
  5. Experimental and Clinical Endocrinology, Med Clinic I, University of Lubeck, Lübeck, Germany
  6. School of Computer Science, University of Manchester, Manchester, United Kingdom
  7. Health eResearch Centre (HeRC), University of Manchester, Manchester, United Kingdom
Background

Thyroid cancer is the most common endocrine tumor with a steady increase in incidence. It is classified into multiple histopathological subtypes with potentially distinct molecular mechanisms.  Identifying the most relevant genes and biological pathways reported in the thyroid cancer literature is vital for the understanding of the disease and developing targeted therapeutics. 

Results

We developed a large-scale text mining system to generate a molecular profiling of thyroid cancer subtypes. The system first uses a subtype classification method for the thyroid cancer literature, which employs a scoring scheme to assign different subtypes to articles. We evaluated the classification method on a gold standard derived from the PubMed Supplementary Concept annotations, achieving an F1-score of over 80% for most subtypes. We then used the subtype classification results to extract genes and pathways associated to different thyroid cancer subtypes. 

Conclusions

Identification of key genes and pathways plays a central role in understanding the molecular biology of thyroid cancer. An integration of subtype context will allow prioritized screening for diagnostic biomarkers and novel molecular targeted therapeutics. Source code used for this study is made freely available online at https://github.com/chengkun-wu/GenesThyCan.