Question: Examples Of Dna Sequence Motif Sets For Testing Search Algorithm
1
gravatar for Faheemmitha
2.5 years ago by
Faheemmitha160
Faheemmitha160 wrote:

This is a followup to Resurrecting DNA motif finding project.

I'm looking for sets of aligned DNA sequence motifs to use for testing my search algorithm. This algorithm looks for correlations across the whole motif, so it performs best if

a) The length of the motif is small. Say between 10 and 30 characters long, preferably. Anything shorter or longer would probably not work well.

b) The set is large. Ideally several hundred. The longer the motif, the larger the set needs to be.

If you know of motifs like these, please list them. It would be helpful if a link could be provided to the data, preferably as a FASTA file, and also a description of the biological significance of the motifs. A description of the conserved regions would also be helpful.

I've not a biologist, so please don't assume a lot of biological background. Thanks.

ADD COMMENTlink modified 2.5 years ago by Giovanni M Dall'Olio17k • written 2.5 years ago by Faheemmitha160
3
gravatar for Sean Davis
2.5 years ago by
Sean Davis15k
Bethesda, MD
Sean Davis15k wrote:

You might take a look at the JASPAR database, if I understand your question correctly.

ADD COMMENTlink written 2.5 years ago by Sean Davis15k

Thanks Sean. This looks interesting. Now all I have to do is figure out what I need to download... I can't tell if FASTA files are available - I don't see them.

ADD REPLYlink written 2.5 years ago by Faheemmitha160

The files in http://jaspar.genereg.net/html/DOWNLOAD/sites/ look like FASTA files, though they are labelled *.sites. Have I got this correct? Is this what I need?

ADD REPLYlink written 2.5 years ago by Faheemmitha160

Yes, those are FASTA format files.

ADD REPLYlink written 2.5 years ago by Sean Davis15k
1
gravatar for Ian
2.5 years ago by
Ian3.3k
University of Manchester, UK
Ian3.3k wrote:

You might find data from the UNIPROBE project useful (but it is mostly mouse based).

ADD COMMENTlink written 2.5 years ago by Ian3.3k
0
gravatar for Giovanni M Dall'Olio
2.5 years ago by
London, UK
Giovanni M Dall'Olio17k wrote:

You should look at Prosite, which is the database of Protein Domain Profiles from the same institute as Uniprot.

Unfortunately, I think that most of the DNA regulatory motifs are smaller than 10 nucleotides. For example, the splicing signals are usually composed of many short degenerated motifs, that interact together.

ADD COMMENTlink written 2.5 years ago by Giovanni M Dall'Olio17k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 216 posts viewed in the last hour