If I understood your problem, I think it corresponds to enumerate all possible combinations of substitutions on all combinations of k positions in a sequence. (k is the edit distance)
This can be done by Knuth algorithm for combination enumeration. First you enumerate combinations of positions then you enumerate combinations of modifications.
I made a brute force solver for alignments based on this idea. But you can not avoid complexity.
For example, if you want to generate sequences at an edit distance 2 (k) from a reference sequence of length 5 (m) with an alphabet of 4 (a) letters:
First, you choose 2 positions from 5 without repeats (10 pairs)
Second, on these positions, you enumerate the combinations of modifications with repeats (6).
So you have 60 sequences.
I think the number of solutions is like that : (m! / (k!*(m-k)!)) * ((a+k-2)!/(k!(a-2)!))
I hope it will helps you ;)
Bilou.
Edit :
After discussion, I think the best way to generate your sequences is the following :
- Generate sequences with distance k with 0 indel
- Generate sequences with distance k-1 with 1 indel
- Generate sequences with distance k-2 with 2 indels
- ...
- Generate sequences with distance 0 with k indels
It is a huge number of sequences !!
if you come up with a simple implementation, can you post a link to some code? this is an interesting problem.
I will certainly post the code.