Question: index of the residue matched using pairwise2 in biopython
0
22 months ago by

I am interested in knowing the index of residues that match to a string using pairwise 2 in python.

For example I have two strings

``````A:' EEEEE      HHH     HHH             EEEEE'
``````

and

``````B: 'EEE       EEEE       HHH'
``````

and I am interested in finding a the best match between both of them using local pairwise2 function in biopython. One of the alignments that I get is :

``````EEE-------EE---      HHH     HHH             EEEEE
|||       ||   |||||||||
EEE       EEEE       HHH--------------------------
Score=29.6
``````

I want to get the indices of the match i.e the positions of all the `Es`, `Hs` and `' '` from seq A that matched with seq B.

How do I do that?

modified 22 months ago by Russ460 • written 22 months ago by akdbharadwajiitkgp0
1
22 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello,

you could write a function which iterates over the alignment strings and look at which position are the same symbols.

``````from Bio import pairwise2

def match_index(alignment):
matches = []

for i, (a, b) in enumerate(zip(alignment[0], alignment[1])):
if a == b:
matches.append(i)

return matches

seq1 = " EEEEE      HHH     HHH             EEEEE"
seq2 = "EEE       EEEE       HHH"

alignments = pairwise2.align.localxx(seq1, seq2)
m = match_index(alignments[0])
print(m)
``````

fin swimmer

1
22 months ago by
Russ460
Ontario Veterinary College, University of Guelph, Guelph, Ontario, Canada
Russ460 wrote:

I took a look at the pairwise2 module but also couldn't find any easy way to obtain the indices of the matches ("|"). I was able to modify the format_alignment method slightly to print a list of indices (that starts at 0) of the pipe operator for this specific example - but I'm not sure how it will perform under different testing conditions. Where this may fail is if the "begin" variable != 0 - you'll have to keep an eye out for that.

``````from Bio import pairwise2
from Bio.pairwise2 import format_alignment

## This is a modification of the format_alignment method
def match_index(align1, align2, score, begin, end):
"""Format the alignment prettily into a string.

Since Biopython 1.71 identical matches are shown with a pipe
character, mismatches as a dot, and gaps as a space.

Note that spaces are also used at the start/end of a local
alignment.

Prior releases just used the pipe character to indicate
aligned region (matches, mismatches and gaps).
"""
s = []
s.append("%s\n" % align1)
s.append(" " * begin)
for a, b in zip(align1[begin:end], align2[begin:end]):
if a == b:
s.append("|")  # match
elif a == "-" or b == "-":
s.append(" ")  # gap
else:
s.append(".")  # mismatch
s.append("\n")
s.append("%s\n" % align2)
s.append("  Score=%g\n" % score)

## Obtain indices of matching characters (indicated by the "|" character)
c = []
for pos, char in enumerate(s):
pipe = "|"
if char == pipe:
c.append(pos-2)

return(c)
return ''.join(s)

alignments = pairwise2.align.globalxx(' EEEEE      HHH     HHH             EEEEE', 'EEE       EEEE       HHH')
print(format_alignment(*alignments[0]))
print(match_index(*alignments[0]))
``````

Results:

`````` EEEEE      HHH     HHH      ----       EEEEE---
|||             |   ||||||    |||||||
---EEE------------- ---      EEEE       -----HHH
Score=17

[3, 4, 5, 19, 23, 24, 25, 26, 27, 28, 33, 34, 35, 36, 37, 38, 39]
``````

The other caveat is that I was unable to exactly replicate your alignment without the code that you used, so the alignment depicted here is different than the one in your example. Hope this helps.