8.9 years ago by

Saclay, France

If I understood your problem, I think it corresponds to enumerate all possible combinations of substitutions on all combinations of k positions in a sequence. (k is the edit distance)

This can be done by Knuth algorithm for combination enumeration. First you enumerate combinations of positions then you enumerate combinations of modifications.

I made a brute force solver for alignments based on this idea. But you can not avoid complexity.

For example, if you want to generate sequences at an edit distance 2 (k) from a reference sequence of length 5 (m) with an alphabet of 4 (a) letters:

First, you choose 2 positions from 5 without repeats (10 pairs)

Second, on these positions, you enumerate the combinations of modifications with repeats (6).

So you have 60 sequences.

I think the number of solutions is like that : `(m! / (k!*(m-k)!)) * ((a+k-2)!/(k!(a-2)!))`

I hope it will helps you ;)

Bilou.

Edit :

After discussion, I think the best way to generate your sequences is the following :

- Generate sequences with distance k with 0 indel
- Generate sequences with distance k-1 with 1 indel
- Generate sequences with distance k-2 with 2 indels
- ...
- Generate sequences with distance 0 with k indels

It is a huge number of sequences !!

•

link
modified 5 months ago
by
RamRS ♦ **20k**
•
written
8.9 years ago by
Bilouweb • **1.1k**
if you come up with a simple implementation, can you post a link to some code? this is an interesting problem.

22kI will certainly post the code.

3.3k