Question: Best tool to find potential TF binding sites within a specific DNA sequence?
2
gravatar for wildtype
19 months ago by
wildtype40
wildtype40 wrote:

Hi, I don't have much experience with motif searches, and I would like to hear your advice on the following task:

I have a DNA sequence (~300 bp) which hypothetically contains a regulatory motif. For example, a 300 bp region upstream the TSS of a gene. There is no prior knowledge of what could be binding there, and I want to have some predictions. What tool would be best to scan for motifs similar to any of the known TF binding motifs in Drosophila? Further, what could be a good tool to submit an alignment from multiple species, and find a conserved motif? (again Drosophila; I don't want to find any motif, but a motif corresponding to a known factor).

Thanks in advance

tf motif sequence binding • 2.0k views
ADD COMMENTlink modified 19 months ago by Whoknows670 • written 19 months ago by wildtype40
4
gravatar for Petr Ponomarenko
19 months ago by
United States / Los Angeles / ALAPY.com
Petr Ponomarenko2.5k wrote:

As kennethcondon2007 said no matter what you chose to search for TFBSs in a single sequence you will get lots of falls positives. Using multiple coregulated genes to compare their promoters for enriched signal is one way of reducing FP. The second option is to search for SNP databases (this is somewhat similar to conservation) as some TFs tend to be very conserved and with much lower SNP probability. The third option is to focus on TFBSs and/or motifs that have a very narrow window of possible locations and build from them using known TF-TF interactions, for example, TATA-box. Fourth, if you know your gene is regulated by some TF and definitely is not regulated by the other, then you can force in TFBS for the first one and force second out. Fifth, lookup orthologous and paralogous genes, including pseudogenes - their promoter organization sometimes is conserved. Fifths, if you use PWMs say from TRANSFAC with Match, make sure you know the origin of the matrix and what it's direction mean. For some TFBSs direction is important and usually, this is direction relative to some other nearby TFBS. I am not an expert with Drosophila and your gene, but your gene can have alternative transcription start sites and alternative promoters. Finally, some TFs bind downstream of TSS and yours might be of this kind...

ADD COMMENTlink written 19 months ago by Petr Ponomarenko2.5k
1

Thanks for the insights. I didn't explain the exact biological question but just an analogous example for simplicity, but in reality it's a region within the first intron, not TSS. This region is well conserved within the 12 Drosophila genomes, and there's a peak of DNA accessibility in D. mel . These observation make me think that there must be some factor binding there (not necessarily known). I would like to check for the presence of possible known motifs there, fully aware that there could be false positives.. but I don't know where else to start. There a few other genes that seem to be co-regulated, but it could be for other reasons, so I am not sure if adding them can help or hurt. I tried to see ChIP-seq/chip data on the modEncode browser from this region, but this data is from embryos and only a few TFs and there wasn't a convincing peaks.

ADD REPLYlink written 19 months ago by wildtype40

Interesting. Do you see that conservation and peak accessibility in other species in the same region? Do you have access to a wet lab or have funding to order some wet lab tests elsewhere, or is this pure bioinformatics task for you?

ADD REPLYlink written 19 months ago by Petr Ponomarenko2.5k

I do I will clone the fragment and put it before a reporter to see what happens, but meanwhile i wanted to check if I could predict anything computationally.

ADD REPLYlink written 18 months ago by wildtype40

God the complexity. That even scared me a bit.

ADD REPLYlink written 19 months ago by YaGalbi1.4k
1
gravatar for YaGalbi
19 months ago by
YaGalbi1.4k
Biocomputing, MRC Harwell Institute, Oxford, UK
YaGalbi1.4k wrote:

The problem with a single sequence is the number of possibilities in the search space. In your case you have a 300 base example. You have no idea of the length of possible motifs if any, or where they occur. You would be searching a database of many motifs of many different lengths .... the number of possible matches is enormous so any results you get could be occuring completely by chance and have no biological relevance whatsoever.

A motif search usually is carried out on a group of related sequences (not a single sequence) to find short seeds that are enriched. For example, if you have a set of 10 co-expressed genes you can extract the 300 bases upstream of the TSS . This set of 10x300 base sequences can then be analysed for short enriched fragments within. Then it is those short fragments that are used as search items against a database of TFs.

You require more sequences with a close relationship to your current one. "Close relationship" can be defined as: co-expressed, tissue specific, homologues...and many other ways.

ADD COMMENTlink modified 19 months ago • written 19 months ago by YaGalbi1.4k

Right, I understand that it might not be relevant especially with only one sequence, but if I have a set of sequences that are potentially co-regulated as you say, then what tool I could use to predict TF binding sites (Drosophila)? Or a tool that takes conservation into account? Thanks

ADD REPLYlink modified 19 months ago • written 19 months ago by wildtype40

There are lots.... Meme, Homer, i-cisTarget to name a few. Personall I quite like i-cisTarget

ADD REPLYlink modified 19 months ago • written 19 months ago by YaGalbi1.4k
0
gravatar for Ben
19 months ago by
Ben50
Ben50 wrote:

The best way to find the potential binding site in the 300bp region is to use ChIP-seq data.

ADD COMMENTlink written 19 months ago by Ben50
0
gravatar for Whoknows
19 months ago by
Whoknows670
Tehran,Iran
Whoknows670 wrote:

You could try Genomatix tools it has 2 tools for finding best TF binding sites or best TF for candidate genes.

Genomatix works based on input sequence or Gene symbol for finding candidate binding sites or TFs

ADD COMMENTlink written 19 months ago by Whoknows670
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1456 users visited in the last hour