Biopython pairwise2 doesn't give the optimal alignment
1
0
Entering edit mode
5.1 years ago
Bo Sim • 0

Hi,

I'm running Biopython 1.66 on Linux. In my current example I'm trying to align two string using the pairwise2 package I'm using a scoring matrix for the matching and simple scoring for gaps.

Here's the code

scoreMatrix = {('A', 'A'): 2, ('A', 'C'): -1, ('A', 'G'): -1, ('A', 'T'): -1,
                                   ('C', 'C'): 2, ('C', 'A'): -1, ('C', 'G'): -1, ('C', 'T'): -1,
                                   ('G', 'G'): 2, ('G', 'C'): -1, ('G', 'A'): -1, ('G', 'T'): -1,
                                   ('T', 'T'): 2, ('T', 'C'): -1, ('T', 'G'): -1, ('T', 'A'): -1,
                                   ('X', 'X'): 2,
                                   ('X', 'T'): -11,
                                   ('X', 'C'): -11,
                                   ('X', 'G'): -11,
                                   ('X', 'A'): -11,
                                   ('T', 'X'): -11,
                                   ('C', 'X'): -11,
                                   ('G', 'X'): -11,
                                   ('A', 'X'): -11
                                   }
                    monomerAllignmentsGlobal = pairwise2.align.globalds(seq1, seq2, scoreMatrix, -1, -0.5)

The resulting alignment does not seem to be optimal:

CTCGGAGTCCCAGAGCCAAGGAGGCTCCCGCTGCCGGGCCCTGAGGCAGAAACCTCTCGGGCCGGGCGGACCCCTGTGCTCTCACCAGGAAG---------- |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTCGGAGTCCCAGAGCCAAGGAGGCTCCCGCCGCCGGGCCCTGAGGCAGAAACCTCTCGGGCCGGGCGGACCCCTGTGCTCTC--------AXXXXXXXXXX

(I apologize I don't know how to format the result better, copying it to a text editor might give you a better overview)

The last 'A' in front of 'XXXXXXXXXX' should ideally come before the last gap opening.

Is there a bug in Biopython or am I missing something?

Any help is appreciated.

Boris

Biopython alignment pairwise • 1.5k views
ADD COMMENT
3
Entering edit mode
5.1 years ago
Markus ▴ 310

This is a limitation of the former implementation of the Gotoh algorithm in Biopython's pairwise2 module. In this implementation a gap in one sequence cannot be followed by a gap in the other sequence.

Good news: This has been changed in Biopython 1.68.

Running with Biopyhton 1.68 you get (truncated):

CTCGGAGTCCCAGAGCCAAGGAG...ACCTCTCGGGCCGGGCGGACCCCTGTGCTCTCACCAGGAAG----------
|||||||||||||||||||||||...|||||||||||||||||||||||||||||||||||||||||||||||||||
CTCGGAGTCCCAGAGCCAAGGAG...ACCTCTCGGGCCGGGCGGACCCCTGTGCTCTCA--------XXXXXXXXXX
  Score=155

This is your expected result.

ADD COMMENT
0
Entering edit mode

Thank you Markus, upgrading to 1.68 did help.

ADD REPLY

Login before adding your answer.

Traffic: 2396 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6