Is There A Routine In Biopython For Finding The Longest Common Subsequence Of Two Sequences?
2
3
Entering edit mode
11.1 years ago
sarochan ▴ 30

I apologize in advance if this is a very stupid question. I'm very new to using Biopython (and to Python/coding in general), so sometimes it's difficult for me to find my way around.

Is there a subroutine in Biopython that will give one the longest common subsequence of two sequences? I was reading the tutorial and found many ways of aligning two sequences, but I'm not sure if they give the longest common subsequence or not.

Sorry once again if this is incredibly dumb, and thank you in advance!

biopython sequence alignment • 6.8k views
ADD COMMENT
0
Entering edit mode

Not silly at all; in fact a classic computer science problem.

ADD REPLY
7
Entering edit mode
11.1 years ago

If you want to find the longest substring without any mismatches between two strings you could use this function:

def longest_substring(s1, s2):
    t = [[0]*(1+len(s2)) for i in range(1+len(s1))]
    l, xl = 0, 0
    for x in range(1,1+len(s1)):
        for y in range(1,1+len(s2)):
            if s1[x-1] == s2[y-1]:
                t[x][y] = t[x-1][y-1] + 1
                if t[x][y]>l:
                    l = t[x][y]
                    xl  = x
            else:
                t[x][y] = 0
    return s1[xl-l: xl]

It gives you:

s1 = "sherlock holmes and dr watson"
s2 = "dr watson and jack the ripper"
print longest_substring(s1,s2)
'dr watson'
ADD COMMENT
1
Entering edit mode
11.1 years ago

Generally, perhaps you can apply a Needleman-Wunsch global alignment with a high gap penalty. Bring in the alignments into an AlignIO object, iterate through them and count the longest run of bases between gaps within an alignment. Then sort alignments by this count and pull out the top hit(s).

ADD COMMENT

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6