Question: Examples Of Dna Sequence Motif Sets For Testing Search Algorithm
1
gravatar for Faheemmitha
3.4 years ago by
Faheemmitha190
Faheemmitha190 wrote:

This is a followup to Resurrecting DNA motif finding project.

I'm looking for sets of aligned DNA sequence motifs to use for testing my search algorithm. This algorithm looks for correlations across the whole motif, so it performs best if

a) The length of the motif is small. Say between 10 and 30 characters long, preferably. Anything shorter or longer would probably not work well.

b) The set is large. Ideally several hundred. The longer the motif, the larger the set needs to be.

If you know of motifs like these, please list them. It would be helpful if a link could be provided to the data, preferably as a FASTA file, and also a description of the biological significance of the motifs. A description of the conserved regions would also be helpful.

I've not a biologist, so please don't assume a lot of biological background. Thanks.

ADD COMMENTlink modified 3.4 years ago by Giovanni M Dall'Olio20k • written 3.4 years ago by Faheemmitha190
3
gravatar for Sean Davis
3.4 years ago by
Sean Davis17k
Bethesda, MD
Sean Davis17k wrote:

You might take a look at the JASPAR database, if I understand your question correctly.

ADD COMMENTlink written 3.4 years ago by Sean Davis17k

Thanks Sean. This looks interesting. Now all I have to do is figure out what I need to download... I can't tell if FASTA files are available - I don't see them.

ADD REPLYlink written 3.4 years ago by Faheemmitha190

The files in http://jaspar.genereg.net/html/DOWNLOAD/sites/ look like FASTA files, though they are labelled *.sites. Have I got this correct? Is this what I need?

ADD REPLYlink written 3.4 years ago by Faheemmitha190

Yes, those are FASTA format files.

ADD REPLYlink written 3.4 years ago by Sean Davis17k
1
gravatar for Ian
3.4 years ago by
Ian3.7k
University of Manchester, UK
Ian3.7k wrote:

You might find data from the UNIPROBE project useful (but it is mostly mouse based).

ADD COMMENTlink written 3.4 years ago by Ian3.7k
0
gravatar for Giovanni M Dall'Olio
3.4 years ago by
London, UK
Giovanni M Dall'Olio20k wrote:

You should look at Prosite, which is the database of Protein Domain Profiles from the same institute as Uniprot.

Unfortunately, I think that most of the DNA regulatory motifs are smaller than 10 nucleotides. For example, the splicing signals are usually composed of many short degenerated motifs, that interact together.

ADD COMMENTlink written 3.4 years ago by Giovanni M Dall'Olio20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 818 users visited in the last hour