using python for pairwise alignment
2
5
Entering edit mode
8.8 years ago

Hi

I'm writing a Python program and I have to do a pairwise alignment on several thousand DNA sequences. I looked at Biopython but I couldn't fine a function to do a pairwise alignment, this may be my mistake. The function should have gap penalty, gap open, gap extension and Smith Waterman or Needleman Wunsch. Can anyone tell me where I can find a good Python package for doing this kind of alignment?

pairwise-alignment python • 21k views
0
Entering edit mode
4
Entering edit mode
8.8 years ago
João Rodrigues ★ 2.5k

Hi,

BioPython does exactly what you are asking, you probably looked in the wrong place :) You should look at the Bio.pairwise2 module. See the following example for global pairwise alignment:

from Bio import pairwise2
from Bio.SubsMat import MatrixInfo as matlist

matrix = matlist.blosum62

gap_open = -10
gap_extend = -0.5

# Only using the first 60 aas for visualization reasons..

p53_human = "MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP"
p53_mouse = "MEESQSDISLELPLSQETFSGLWKLLPPEDILPSPHCMDDLLLPQDVEEFFEGPSEALRV"

alns = pairwise2.align.globalds(p53_human, p53_mouse, matrix, gap_open, gap_extend)
top_aln = alns[0]
aln_human, aln_mouse, score, begin, end = top_aln
print aln_human+'\n'+aln_mouse

MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP----
MEESQSDISLELPLSQETFSGLWKLLPPEDIL-PSP-HCMDDLLL-PQDVEEFF-EGPSEALRV

# Comparing with EBI NEEDLE output
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP------
MEESQSDISLELPLSQETFSGLWKLLPPEDIL-PSP-HCMDDLLL-PQDVEEFF---EGPSEALRV

0
Entering edit mode
8.8 years ago

Hi jellevandewege,

You can use the Bio.Emboss python package for Needleman Wunsch alignment (It is essentially a wrapper around the program).

Here is a short example copied from my code base:

from Bio.Emboss.Applications import NeedleCommandline
from Bio import AlignIO

needle_cli = NeedleCommandline(asequence=seq_fname1, \
bsequence=seq_fname2, \
gapopen=10, \
gapextend=0.5, \
outfile=needle_fname)

"""This generates the needle file"""
needle_cli()
"""That parses the needle file, aln[0] and aln[1] contain the aligned
first and second sequence in the usual format (e.g. - for a gap)"""


Yours,
Mahmoud

0
Entering edit mode

is it possible to perform pairwise sequence alignment sequence 1:target sequence sequence 2: sequence extract from NCBI PDB based on PDB id like 1mkp