Intrinsic properties of genomic sequences allow prediction of transcription factor binding regions  — ASN Events

Intrinsic properties of genomic sequences allow prediction of transcription factor binding regions  (#268)

Tsung-Yeh Zing Tsai 1 , Shin-Han Shiu 2 , Huai-Kuang Tsai 1
  1. Academia Sinica, Taipei, Taiwan
  2. Michigan State University, Michigan , USA
Transcription factor binding is determined by multiple factors including sequence specificity and chromatin accessibility where the latter is influenced by both chromatin state and DNA structural properties. Although these features can be used to predict TF binding sites, their relative and joint contributions remain unclear. Particularly, given some of these features can be predicted based on genomic sequence alone, it remains an open question how well they can be applied for predicting binding regions. By a systematic assessment on the impact of jointly considering 23 features in predicting TF binding preference, chromatin state and DNA structural properties are better predictors for binding than sequence motif of a TF. In addition, simultaneously considering chromatin state and DNA structural properties further improves the accuracy of TF binding prediction, indicating that these two feature sets are highly synergistic. However, their relative contributions differ greatly between TFs. Most importantly, we show that three DNA intrinsic properties are particularly critical in predicting TF binding. Using the intrinsic model, we can predict binding regions not only across TFs but also across DNA-binding domain families with distinct structural folds. The intrinsic property model allows TF binding predictions across DNA-binding domain families that are present in most eukaryotes, suggesting that the model is likely universal and can be used across species. Thus our findings demonstrate the feasibility in establishing a universal model for identifying regulatory regions in any sequenced genomes.