Question: Has Enhancer And Transcription Factor Binding Site Prediction Already Been Made Redundant?
gravatar for Allpowerde
10.3 years ago by
Allpowerde1.2k wrote:

ENCODE soon provides DNase I hypersensitivity data for the whole genome in a multitude of different tissues. DNase I hypersensitivity marks genomic positions that are exposed and can hence be used to pinpoint active promoters or enhancers in the studied tissue. DNase I resistant regions, in contrast, mark genomic areas that are protected, e.g. because a transcription factor (TF) is bound. Since the data provides a base-pair resolution, it is possible to "zoom" in on the protected areas (== transcription factor binding sites) of the otherwise exposed regions (== enhancers). One can hence identify the shadow-prints on the genome left by the regulatory TFs in a given tissue. To identify which TFs are casting the shadows one could use ChIP-seq (rough binding regions) or Protein Binding Arrays (binding motif).

The question is: has the in-silico prediction of enhancers, binding sites or partners still merit or will we be soon able to look-up the binding events of TFs in the different tissues?

binding transcription • 6.2k views
ADD COMMENTlink modified 4.5 years ago by Biostar ♦♦ 20 • written 10.3 years ago by Allpowerde1.2k
gravatar for Pedrobeltrao
10.3 years ago by
Pedrobeltrao140 wrote:

I don't work on prediction of transcription factor binding or enhancers so I will just give a very general answer that could apply to any sort of prediction.

I think there is a big difference between observing an event (ex. transcription factor binding to region X) and knowing why you observe it. To put it in another way .. if we can solve protein structures should we still try to predict how a protein might fold ? Prediction tries to encapsulates our knowledge of the system so I think the answer is that we will never stop trying to predict/model a system even if we can just easily measure it. Until we can model it we don't really know how it works. If you are only interested in knowing where a TF might bind to then the observations are enough but if you want to know why a protein with those characteristics is binding to that DNA region then the observations are just the starting point.

ADD COMMENTlink written 10.3 years ago by Pedrobeltrao140
gravatar for Istvan Albert
10.3 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

I wouldn't venture to hypothesize on what will happen; making predictions is difficult, especially about the future.

What seems prudent to assume however is that there may be several reasons of why genomic regions are accessible or protected. For example: is it a transcription factor that protects the region or is there some other reason of why the TF will bind to that location to begin with. For example chromatin structure and nucleosome positioning may favor or disfavor certain events.

Any in-silico modeling will need to take into account the various mechanisms that may take place.

ADD COMMENTlink written 10.3 years ago by Istvan Albert ♦♦ 84k
gravatar for Nicojo
10.2 years ago by
Kyoto, Japan
Nicojo1.1k wrote:

I agree with Istvan and pedrobeltrao.

I would just add that we should all be careful when we look at the type of experiments that you describe as well as protein structures and many other biochemical experiments.

They are, most often, snapshots of what is happening in an extremely dynamic environment, which is the cell and its components.

What holds true at one moment (when the cell was fixed, the proteins extracted or crystalized) is not the whole picture of what's happening or how things look.

I think you'll need more than a few biochemical experiments to know how the cell really works. Until that day, predictions and modeling will always be useful.

ADD COMMENTlink written 10.2 years ago by Nicojo1.1k
gravatar for Phis
10.2 years ago by
Phis1.0k wrote:

In addition to what's already been said, I'd like to add that even if these data did completely abolish the need to predict TF binding sites (which I'm not entirely sure about), there are still many cases - and many organisms - where such data aren't available, implying that there's still a niche for computational approaches.

ADD COMMENTlink written 10.2 years ago by Phis1.0k

I was going to say that as much as I appreciate and use ENCODE data-- that the less-than-one-handful of species covered by this and by modENCODE leaves a lot of ground to cover....

ADD REPLYlink written 9.4 years ago by Mary11k
gravatar for Mikael Huss
10.2 years ago by
Mikael Huss4.7k
Mikael Huss4.7k wrote:

We still have a long way to go when it comes to enhancer discovery. The fact that a genomic region comes out as DNAse I hypersensitive in a certain tissue does not necessarily mean it is an enhancer region in that tissue. Here, I think the DNAse I hypersensitivity (and FAIRE) data should be regarded as a necessary input to improved enhancer prediction algorithms, rather than something to replace them. (In fact I think there are very few enhancer prediction algorithms out there, so the help is sorely needed!)

Similarly, I don't think DNAse or FAIRE in themselves say much about transcription factor binding, although they can be very informative in combination with knowledge of the TF motif (or so I've heard). ChIP-seq, on the other hand, does give pretty solid information on TF binding which I agree would more or less supersede computational predictions in the relevant tissue in the given organism. As others have pointed out in this thread, though, there are many organisms and/or tissues for which we won't have ChIP-seq within a foreseeable time, and for those cases (and others) we can hopefully use existing ChIP-seq data to refine computational models of TF binding. So I would regard ChIP-seq data as something that helps us refine our understanding of TF binding, including the prediction of binding events in various systems.

ADD COMMENTlink written 10.2 years ago by Mikael Huss4.7k
gravatar for Larry_Parnell
9.4 years ago by
Boston, MA USA
Larry_Parnell16k wrote:

Excellent replies above. I'd like to throw some data out there. Mike Snyder (Stanford) of ENCODE has said in his talks that ~20-25% of RNA Pol II sites and ~7% of NfKB binding sites show variable binding between any two humans. As one who works on genetic variation, I know there is not that much variation between two human genomes at those specific sites. Why the differential binding? We don't know yet, but this makes it all the more important to keep both the lab and in silico arms of TFBS work ongoing.

ADD COMMENTlink modified 9.4 years ago • written 9.4 years ago by Larry_Parnell16k

ADD REPLYlink written 9.2 years ago by User 348430

Larry, you mentioned Mike Snyder's talk.Has the results been published? If yes, a link to the paper will be favorable.

ADD REPLYlink written 9.4 years ago by Dejian1.3k

Larry, you mentioned Mike Snyder's talk.Has the result been published? If yes, a link to the paper will be favorable.

ADD REPLYlink written 9.4 years ago by Dejian1.3k

I don't believe those details are published but also have not looked through all the ENCODE papers that came out recently. I could not link to all those papers...

ADD REPLYlink written 9.4 years ago by Larry_Parnell16k
gravatar for User 3484
9.2 years ago by
User 348430
User 348430 wrote:

Two recent papers address this issue with DNase data:

ADD COMMENTlink written 9.2 years ago by User 348430
gravatar for Dejian
9.4 years ago by
United States
Dejian1.3k wrote:

Prediction will always be useful and necessary, I think. Learning about TFBS or other biological facts will finally boost synthetic biology or bioengineering. Predicitng something first and validating it later helps impove our understanding of biology.

ADD COMMENTlink written 9.4 years ago by Dejian1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour