Following up on this question: http://biostar.stackexchange.com/questions/598/tools-for-chipseq-scale-motif-finding
I've got a large amount of unaligned eukaryotic regulatory sequences and I want to do de novo motif discovery on them. These unaligned regulatory sequences are already filtered from reads that have no mapping, or reads that wouldn't make a peak.
I've seen most tools require aligned sequences and/or search only for a list of pre-defined motifs.
In it's simplest form, what I am looking for is a program that would read file.fa, where file.fa contains ~1M 50-200bp regulatory sequences, and produce the motif predictions, not needing to align it to a reference or scan for known motifs.
Does anybody know of a tool that would work for this amounts of unaligned fasta sequences and do de novo motif discovery?