Intrinsic properties of genomic sequences allow prediction of transcription factor binding regions (#268)
Transcription factor binding is determined by multiple
factors including sequence specificity and chromatin accessibility where the
latter is influenced by both chromatin state and DNA structural properties.
Although these features can be used to predict TF binding sites, their relative
and joint contributions remain unclear. Particularly, given some of these
features can be predicted based on genomic sequence alone, it remains an open
question how well they can be applied for predicting binding regions. By a
systematic assessment on the impact of jointly considering 23 features in
predicting TF binding preference, chromatin state and DNA structural properties
are better predictors for binding than sequence motif of a TF. In addition,
simultaneously considering chromatin state and DNA structural properties
further improves the accuracy of TF binding prediction, indicating that these
two feature sets are highly synergistic. However, their relative contributions
differ greatly between TFs. Most importantly, we show that three DNA intrinsic
properties are particularly critical in predicting TF binding. Using the
intrinsic model, we can predict binding regions not only across TFs but also
across DNA-binding domain families with distinct structural folds. The
intrinsic property model allows TF binding predictions across DNA-binding
domain families that are present in most eukaryotes, suggesting that the model
is likely universal and can be used across species. Thus
our findings demonstrate the feasibility in establishing a universal model for
identifying regulatory regions in any sequenced genomes.