Question: index of the residue matched using pairwise2 in biopython
gravatar for akdbharadwajiitkgp
22 months ago by
akdbharadwajiitkgp0 wrote:

I am interested in knowing the index of residues that match to a string using pairwise 2 in python.

For example I have two strings

A:' EEEEE      HHH     HHH             EEEEE'


B: 'EEE       EEEE       HHH'

and I am interested in finding a the best match between both of them using local pairwise2 function in biopython. One of the alignments that I get is :

EEE-------EE---      HHH     HHH             EEEEE
|||       ||   |||||||||
EEE       EEEE       HHH--------------------------

I want to get the indices of the match i.e the positions of all the Es, Hs and ' ' from seq A that matched with seq B.

How do I do that?

ADD COMMENTlink modified 22 months ago by Russ460 • written 22 months ago by akdbharadwajiitkgp0
gravatar for finswimmer
22 months ago by
finswimmer13k wrote:


you could write a function which iterates over the alignment strings and look at which position are the same symbols.

from Bio import pairwise2

def match_index(alignment):
    matches = []

    for i, (a, b) in enumerate(zip(alignment[0], alignment[1])):
        if a == b:

    return matches

seq1 = " EEEEE      HHH     HHH             EEEEE"
seq2 = "EEE       EEEE       HHH"

alignments = pairwise2.align.localxx(seq1, seq2)
m = match_index(alignments[0])

fin swimmer

ADD COMMENTlink modified 22 months ago • written 22 months ago by finswimmer13k
gravatar for Russ
22 months ago by
Ontario Veterinary College, University of Guelph, Guelph, Ontario, Canada
Russ460 wrote:

I took a look at the pairwise2 module but also couldn't find any easy way to obtain the indices of the matches ("|"). I was able to modify the format_alignment method slightly to print a list of indices (that starts at 0) of the pipe operator for this specific example - but I'm not sure how it will perform under different testing conditions. Where this may fail is if the "begin" variable != 0 - you'll have to keep an eye out for that.

from Bio import pairwise2
from Bio.pairwise2 import format_alignment

## This is a modification of the format_alignment method 
def match_index(align1, align2, score, begin, end): 
      """Format the alignment prettily into a string. 

      Since Biopython 1.71 identical matches are shown with a pipe 
      character, mismatches as a dot, and gaps as a space. 

      Note that spaces are also used at the start/end of a local 

      Prior releases just used the pipe character to indicate  
      aligned region (matches, mismatches and gaps). 
      s = [] 
      s.append("%s\n" % align1) 
      s.append(" " * begin) 
      for a, b in zip(align1[begin:end], align2[begin:end]): 
          if a == b: 
              s.append("|")  # match 
          elif a == "-" or b == "-": 
              s.append(" ")  # gap 
              s.append(".")  # mismatch 
      s.append("%s\n" % align2) 
      s.append("  Score=%g\n" % score)

      ## Obtain indices of matching characters (indicated by the "|" character)
      c = []
      for pos, char in enumerate(s):
        pipe = "|"
        if char == pipe:

      return ''.join(s) 

alignments = pairwise2.align.globalxx(' EEEEE      HHH     HHH             EEEEE', 'EEE       EEEE       HHH')


 EEEEE      HHH     HHH      ----       EEEEE---
   |||             |   ||||||    |||||||        
---EEE------------- ---      EEEE       -----HHH

[3, 4, 5, 19, 23, 24, 25, 26, 27, 28, 33, 34, 35, 36, 37, 38, 39]

The other caveat is that I was unable to exactly replicate your alignment without the code that you used, so the alignment depicted here is different than the one in your example. Hope this helps.

ADD COMMENTlink written 22 months ago by Russ460
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2092 users visited in the last hour