Question: DNA binding site motif database
0
gravatar for jolespin
3.6 years ago by
jolespin120
United States
jolespin120 wrote:

I have a list of unique kmers (5-mers in this case) that are essential to the pathway I'm researching.  Is there a database where I can find what proteins recognize these motifs? Binding DNA or RNA is fine just not sure where to find the db. I'm looking at human sequences but it would be cool if there was one that had all organisms too.   Let's say you were looking for all proteins that bind "TCCTG". 

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by jolespin120

Did you try tomtom from the meme-suite? It searches a motif against many databases. Maybe you will have to translate your kmers to a motif.

ADD REPLYlink written 3.6 years ago by Fidel1.9k
4
gravatar for Kamil
3.6 years ago by
Kamil1.9k
Boston
Kamil1.9k wrote:

Try the MEME archive of motif databases. It includes multiple databases and species.

After downloading the motif_databses.12.6.tar.gz file and inflating it, we'll see a folder called motif_databases with many folders inside corresponding to several different motif databases.

For example, let's have a look at motif_databases/CIS-BP/Homo_sapiens.meme

MEME version 4.4

ALPHABET= ACGT

strands: + -

Background letter frequencies (from uniform background):
A 0.25000 C 0.25000 G 0.25000 T 0.25000 

MOTIF M0085_1.02 (TFAP2E)_(Mus_musculus)_(DBD_0.99)

letter-probability matrix: alength= 4 w= 10 nsites= 1 E= 0
  0.213214      0.176319      0.135951      0.474516    
  0.222124      0.321576      0.134725      0.321576    
  0.004784      0.213834      0.753855      0.027528    
  0.000431      0.961004      0.000168      0.038397    
  0.000296      0.937765      0.000164      0.061775    
  0.001274      0.327763      0.135950      0.535013    
  0.122837      0.314460      0.391841      0.170862    
  0.454990      0.254748      0.230506      0.059756    
  0.118505      0.001133      0.841307      0.039054    
  0.002871      0.001657      0.957955      0.037517    

URL http://cisbp.ccbr.utoronto.ca/TFreport.php?searchTF=T004846_1.02

This motif is for TFAP2E transcription factor AP-2 epsilon and we can learn a bit more about it at the URL listed at the bottom of the record.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Kamil1.9k

I'm not finding a way to actually get the proteins that bind these motifs

ADD REPLYlink written 3.6 years ago by jolespin120

I edited my answer to include more details. You might want to check out the CISBP website.

ADD REPLYlink written 3.6 years ago by Kamil1.9k

Thanks Kamil, what is the KMER you searched for in this? I see the alphabet, but that isn't the Kmer is it? Like if you were looking for proteins that bind "TCCTG"

ADD REPLYlink written 3.6 years ago by jolespin120

I'm showing you an example of a motif called M0085_1.02. The letter-probability matrix describes the motif. In this case, the motif is 10 bases long. The probability of an A in the first position is 0.213214 and the probability of an A in the second position is 0.222124, etc. Each column in the letter-probability matrix corresponds to one of the letters in the alphabet ACGT.

As Fidel mentioned, you might consider running TOMTOM or GOMo with your "TCCTG" sequence.

ADD REPLYlink written 3.6 years ago by Kamil1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1587 users visited in the last hour