I am interested in identifying lengths of all stretches of certain repeats (with unit length 5bp-50bp). For example, all occurrences of (GGAAT)n with total length of at least 1000bp. I could write regexp for perfect stretches of the repeats, but since these will likely be imperfect, some Smith-Waterman type of alignment would be better. I will be searching for these repeats per chromosome, so an input will be single large fasta file and a motif to search for.
PS: Software like TRF identifies repeats de novo and I already have list of motifs to search for, I only need to estimate their length distribution