Question

Longest common substring between two sequences with minimum error correction

0

Entering edit mode

2.8 years ago

Alex • 0

I have two strings A and B. I want to find the longest common substring between these two strings while allowing minimum error correction in B or A. The errors are in the form of adjacent swaps, for example s_1s_2...s_i s_{i+1}...s_n is one swap away from the string s_1s_2...s_{i+1} s_{i}...s_n. Is there any dynamic programming solution for it?

The score for matching a letter in two strings is M and the penalty for swapping is P. So in total the objective is maximizing the total M - P. Obviously If I swap two adjacent letters in A and there is a matched in B for these letters, then the score will increase by 2M - P.

I have written a recursive code in Python for calculating the LCS but I don't know how to add the swap correction.

M = 1 # score for matching a letter in two strings
P = 1 # penalty for swapping

def lcs(A, B, i, j, res):
     if i == -1 or j == -1:
         return res
     if A[i] == B[j]:
         res = lcs(A, B, i - 1, j - 1, res + M)
     return max(res, max(lcs(A, B, i, j - 1, 0), lcs(A, B, i - 1, j, 0)))


>>> A = "fghijklmnopq"
>>> B = "klmnopqrstuv"
>>> res = lcs(A, B, len(A) - 1, len(B) - 1, 0)
>>> print(res)
7

algorithm • 402 views

ADD COMMENT • link 2.8 years ago by Alex • 0