Using pairwise2 for list[str] alignment
1
1
Entering edit mode
7.5 years ago
erannir5 ▴ 10

I'm trying to abuse biopython for some non-bioinformatics problem. I'm trying to align general sentences to find common pattern. In order to do so, I'm breaking the sentences into list of strings and using pairwise2.

Example (no real sentences): list1 = ["MEEP", "QS", "DPSV", "EPPLS"] list2 = ["MEES", "QS", "DISL", "EPPLS"]

Using globalxs: alns = pairwise2.align.globalxs(list1, list2, -10, -0.5, gap_char=['-']) top_aln = alns[0] aln_1, aln_2, score, begin, end = top_aln

The alignment resulted is not exactly what I would've hoped for: ['P', 'E', 'E', 'M', 'S', 'Q', 'V', 'S', 'P', 'D', 'S', 'L', 'P', 'P', 'E'] ['S', 'E', 'E', 'M', 'S', 'Q', 'L', 'S', 'I', 'D', 'S', 'L', 'P', 'P', 'E']

Tried to debug the source code. I'm not totally fluent in Python, so it was hard to follow all the hidden calls to different functions, but it seems that everything is fine with the strings comparison (happens when __call__ of class identity_match is being called in line 901), so I guess the matching matrix is fine. The problem is probably in the backtracing for building the alignment, where I keep getting into this block (line 749): elif trace % 4 == 2: # = match/mismatch of seqA with seqB trace -= 2 row -= 1 col -= 1 ali_seqA += sequenceA[row] ali_seqB += sequenceB[col] col_gap = False so I get no gaps. It's my first run with BioPython, so I guess it's something with my configuration. Splitting the strings and reversing them though, I really don't get...

alignment • 2.4k views
ADD COMMENT
0
Entering edit mode
6.7 years ago
Markus ▴ 320

Sorry, I saw this post quite late. There was a bug in recent Biopython versions (1.68/1.69) which prevented proper handling of list input in pairwise2. This is now solved (Biopython version 1.70):

from Bio import pairwise2 as pw

list1 = ["MEEP", "QS", "DPSV", "EPPLS"]
list2 = ["MEES", "QS", "DISL", "EPPLS"]

alns = pw.align.globalxs(list1, list2, -10, -0.5, gap_char=['-'])

print(pw.format_alignment(*alns[0]))

gives

['MEEP', 'QS', 'DPSV', 'EPPLS']
||||
['MEES', 'QS', 'DISL', 'EPPLS']
  Score=2

(The output isn't nice for lists...)

ADD COMMENT

Login before adding your answer.

Traffic: 1831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6