needleman-wunsch algorithm on python
2
2
Entering edit mode
5.5 years ago
elisheva ▴ 110

hello everyone.

I have to execute the needleman-wunsch algorithm on python for global sequence alignment.

I have to fill 1 matrix withe all the values according to the penalty of match, mismatch, and gap.

And another matrix as pointers matrix - where "v" for vertical, "H" for horizontal and "D" for diagonal.

for example: for the sequences: TCGCA and TCCA, match:1, mismatch:-1 and gap:-2

the output should be:

           T   C   G   C   A
[[  0  -2  -4  -6  -8 -10]
T  [ -2   1  -1  -3  -5  -7]
C  [ -4  -1   2   0  -2  -4]
C  [ -6  -3   0   1   1  -1]
A  [ -8  -5  -2  -1   0   2]]

T   C   G   C   A
[['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'H' 'H' 'H' 'H']
C ['V' 'V' 'D' 'H' 'D' 'H']
C ['V' 'V' 'D' 'D' 'D' 'H']
A ['V' 'V' 'V' 'D' 'D' 'D']]


And my program output is:

           T   C   G   C   A
[[  0  -2  -4  -6  -8 -10]
T  [ -2   1  -1  -3  -5  -7]
C  [ -4  -1   2   0  -2  -4]
C  [ -6  -3   0   1   1  -1]
A  [ -8  -5  -2  -1   0   2]]

T   C   G   C   A
[['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'V' 'V' 'V' 'V']
C ['V' 'H' 'D' 'V' 'D' 'V']
C ['V' 'H' 'D' 'D' 'D' 'V']
A ['V' 'H' 'H' 'D' 'D' 'D']]


## This is my code:

import numpy as np
from string import *

#-------------------------------------------------------
#This function returns to values for cae of match or mismatch
def Diagonal(n1,n2,pt):
if(n1 == n2):
return pt['MATCH']
else:
return pt['MISMATCH']

#------------------------------------------------------------
#This function gets the optional elements of the aligment matrix and returns the elements for the pointers matrix.
def Pointers(di,ho,ve):

pointer = max(di,ho,ve) #based on python default maximum(return the first element).

if(di == pointer):
return 'D'
elif(ho == pointer):
return 'H'
else:
return 'V'

#--------------------------------------------------------
#This function creates the aligment and pointers matrices
def NW(s1,s2,match = 1,mismatch = -1, gap = -2):
penalty = {'MATCH': match, 'MISMATCH': mismatch, 'GAP': gap} #A dictionary for all the penalty valuse.
n = len(s1) + 1 #The dimension of the matrix columns.
m = len(s2) + 1 #The dimension of the matrix rows.
al_mat = np.zeros((m,n),dtype = int) #Initializes the alighment matrix with zeros.
p_mat = np.zeros((m,n),dtype = str) #Initializes the alighment matrix with zeros.
#Scans all the first rows element in the matrix and fill it with "gap penalty"
for i in range(m):
al_mat[i][0] = penalty['GAP'] * i
p_mat[i][0] = 'V'
#Scans all the first columns element in the matrix and fill it with "gap penalty"
for j in range (n):
al_mat[0][j] = penalty['GAP'] * j
p_mat [0][j] = 'H'
#Fill the matrix with the correct values.

p_mat [0][0] = 0 #Return the first element of the pointer matrix back to 0.
for i in range(1,m):
for j in range(1,n):
di = al_mat[i-1][j-1] + Diagonal(s1[j-1],s2[i-1],penalty) #The value for match/mismatch -  diagonal.
ho = al_mat[i-1][j] + penalty['GAP'] #The value for gap - horizontal.(from the left cell)
ve = al_mat[i][j-1] + penalty['GAP'] #The value for gap - vertical.(from the upper cell)
al_mat[i][j] = max(di,ho,ve) #Fill the matrix with the maximal value.(based on the python default maximum)
p_mat[i][j] = Pointers(di,ho,ve)
print np.matrix(al_mat)
print np.matrix(p_mat)


Can anybody find what the problem is?

Thanks!!

python • 27k views
5
Entering edit mode
5.5 years ago

There are a couple problems, not least of which that what you think your program should output is wrong. Below is the corrected program:

def NW(s1,s2,match = 1,mismatch = -1, gap = -2):
penalty = {'MATCH': match, 'MISMATCH': mismatch, 'GAP': gap} #A dictionary for all the penalty valuse.
n = len(s1) + 1 #The dimension of the matrix columns.
m = len(s2) + 1 #The dimension of the matrix rows.
al_mat = np.zeros((m,n),dtype = int) #Initializes the alighment matrix with zeros.
p_mat = np.zeros((m,n),dtype = str) #Initializes the alighment matrix with zeros.
#Scans all the first rows element in the matrix and fill it with "gap penalty"
for i in range(m):
al_mat[i][0] = penalty['GAP'] * i
p_mat[i][0] = 'V'
#Scans all the first columns element in the matrix and fill it with "gap penalty"
for j in range (n):
al_mat[0][j] = penalty['GAP'] * j
p_mat [0][j] = 'H'
#Fill the matrix with the correct values.

p_mat [0][0] = 0 #Return the first element of the pointer matrix back to 0.
for i in range(1,m):
for j in range(1,n):
di = al_mat[i-1][j-1] + Diagonal(s1[j-1],s2[i-1],penalty) #The value for match/mismatch -  diagonal.
ho = al_mat[i][j-1] + penalty['GAP'] #The value for gap - horizontal.(from the left cell)
ve = al_mat[i-1][j] + penalty['GAP'] #The value for gap - vertical.(from the upper cell)
al_mat[i][j] = max(di,ho,ve) #Fill the matrix with the maximal value.(based on the python default maximum)
p_mat[i][j] = Pointers(di,ho,ve)
print np.matrix(al_mat)
print np.matrix(p_mat)

1. ho and ve were swapped
2. di and so on were using the updated penalties if a given path were followed. Instead, you want to go to the greater of the neighboring points.
0
Entering edit mode

...what you think your program should output is wrong.

Ooops, I didn't consider that :)

0
Entering edit mode

I didn't understand why my program should output is wrong. Isn't mismatch better than gap? And also according to ncbi that's realy the right answer:

Query  1  TCGCA  5
|| ||
Sbjct  1  TC-CA  4

0
Entering edit mode

In and of itself, yes, but the neighboring scores represent the score of the best-possible path leading to that point...

0
Entering edit mode

But what you'v changed does'nt work on this case. its output is:

[[  0  -2  -4  -6  -8 -10]
[ -2   1  -1  -3  -5  -7]
[ -4  -1   2   0  -2  -4]
[ -6  -3   0   1   1  -1]
[ -8  -5  -2  -1   0   2]]

[['0' 'H' 'H' 'H' 'H' 'H']
['V' 'D' 'H' 'H' 'H' 'H']
['V' 'V' 'D' 'H' 'H' 'H']
['V' 'V' 'V' 'D' 'H' 'H']
['V' 'V' 'V' 'V' 'D' 'D']]


And as I wrote on top the ncbi indeed gives my expected result:

Query  1  TCGCA  5
|| ||
Sbjct  1  TC-CA  4

0
Entering edit mode

Oh, wait a second. It looks to me like your initial output was already correct and it's just your expected output that's wrong.

0
Entering edit mode

No, my output is wrong. 'v' means up and 'H' left

0
Entering edit mode

Yep, that's correct. So:

         T   C   G   C   A
[['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'V' 'V' 'V' 'V']
C ['V' 'H' 'D' 'V' 'D' 'V']
C ['V' 'H' 'D' 'D' 'D' 'V']
A ['V' 'H' 'H' 'D' 'D' 'D']]


That looks correct to me. A path to get to the bottom right could horizontally along the top for a while (through Hs) then down for a while (through Vs) before it gets to a D and finishes along the diagonal. That's what it's supposed to look like.

0
Entering edit mode

Indeed, ignore my point 2, it was just the swapping of ve and ho. That's what I get for not checking.

I've corrected the code in the answer.

0
Entering edit mode
5.5 years ago

Looks like you can change Pointers like this:

if(di == pointer):
return 'D'
elif(ho == pointer):
return 'V'
else:
return 'H'

0
Entering edit mode

It does'nt explain the problem

Traffic: 917 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.