This is a followup to Resurrecting DNA motif finding project.
I'm looking for sets of aligned DNA sequence motifs to use for testing my search algorithm. This algorithm looks for correlations across the whole motif, so it performs best if
a) The length of the motif is small. Say between 10 and 30 characters long, preferably. Anything shorter or longer would probably not work well.
b) The set is large. Ideally several hundred. The longer the motif, the larger the set needs to be.
If you know of motifs like these, please list them. It would be helpful if a link could be provided to the data, preferably as a FASTA file, and also a description of the biological significance of the motifs. A description of the conserved regions would also be helpful.
I've not a biologist, so please don't assume a lot of biological background. Thanks.
You should look at Prosite, which is the database of Protein Domain Profiles from the same institute as Uniprot.
Unfortunately, I think that most of the DNA regulatory motifs are smaller than 10 nucleotides. For example, the splicing signals are usually composed of many short degenerated motifs, that interact together.