Question: needleman-wunsch algorithm on python
1
gravatar for elisheva
3.9 years ago by
elisheva100
Israel
elisheva100 wrote:

hello everyone.

I have to execute the needleman-wunsch algorithm on python for global sequence alignment.

I have to fill 1 matrix withe all the values according to the penalty of match, mismatch, and gap.

And another matrix as pointers matrix - where "v" for vertical, "H" for horizontal and "D" for diagonal.

for example: for the sequences: TCGCA and TCCA, match:1, mismatch:-1 and gap:-2

the output should be:

           T   C   G   C   A
   [[  0  -2  -4  -6  -8 -10]
T  [ -2   1  -1  -3  -5  -7]
C  [ -4  -1   2   0  -2  -4]
C  [ -6  -3   0   1   1  -1]
A  [ -8  -5  -2  -1   0   2]]

         T   C   G   C   A
  [['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'H' 'H' 'H' 'H']
C ['V' 'V' 'D' 'H' 'D' 'H']
C ['V' 'V' 'D' 'D' 'D' 'H']
A ['V' 'V' 'V' 'D' 'D' 'D']]

And my program output is:

           T   C   G   C   A
   [[  0  -2  -4  -6  -8 -10]
T  [ -2   1  -1  -3  -5  -7]
C  [ -4  -1   2   0  -2  -4]
C  [ -6  -3   0   1   1  -1]
A  [ -8  -5  -2  -1   0   2]]

         T   C   G   C   A
  [['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'V' 'V' 'V' 'V']
C ['V' 'H' 'D' 'V' 'D' 'V']
C ['V' 'H' 'D' 'D' 'D' 'V']
A ['V' 'H' 'H' 'D' 'D' 'D']]

This is my code:

import numpy as np
from string import *

#-------------------------------------------------------
#This function returns to values for cae of match or mismatch
def Diagonal(n1,n2,pt):
    if(n1 == n2):
        return pt['MATCH']
    else:
        return pt['MISMATCH']

#------------------------------------------------------------   
#This function gets the optional elements of the aligment matrix and returns the elements for the pointers matrix.
def Pointers(di,ho,ve):

    pointer = max(di,ho,ve) #based on python default maximum(return the first element).

    if(di == pointer):
        return 'D'
    elif(ho == pointer):
        return 'H'
    else:
         return 'V'    

#--------------------------------------------------------
#This function creates the aligment and pointers matrices
def NW(s1,s2,match = 1,mismatch = -1, gap = -2):
    penalty = {'MATCH': match, 'MISMATCH': mismatch, 'GAP': gap} #A dictionary for all the penalty valuse.
    n = len(s1) + 1 #The dimension of the matrix columns.
    m = len(s2) + 1 #The dimension of the matrix rows.
    al_mat = np.zeros((m,n),dtype = int) #Initializes the alighment matrix with zeros.
    p_mat = np.zeros((m,n),dtype = str) #Initializes the alighment matrix with zeros.
    #Scans all the first rows element in the matrix and fill it with "gap penalty"
    for i in range(m):
        al_mat[i][0] = penalty['GAP'] * i
        p_mat[i][0] = 'V'
    #Scans all the first columns element in the matrix and fill it with "gap penalty"
    for j in range (n):
        al_mat[0][j] = penalty['GAP'] * j
        p_mat [0][j] = 'H'
    #Fill the matrix with the correct values.

    p_mat [0][0] = 0 #Return the first element of the pointer matrix back to 0.
    for i in range(1,m):
        for j in range(1,n):
            di = al_mat[i-1][j-1] + Diagonal(s1[j-1],s2[i-1],penalty) #The value for match/mismatch -  diagonal.
            ho = al_mat[i-1][j] + penalty['GAP'] #The value for gap - horizontal.(from the left cell)
            ve = al_mat[i][j-1] + penalty['GAP'] #The value for gap - vertical.(from the upper cell)
            al_mat[i][j] = max(di,ho,ve) #Fill the matrix with the maximal value.(based on the python default maximum)
            p_mat[i][j] = Pointers(di,ho,ve)
    print np.matrix(al_mat) 
    print np.matrix(p_mat)

Can anybody find what the problem is?

Thanks!!

python • 21k views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by elisheva100
5
gravatar for Devon Ryan
3.9 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

There are a couple problems, not least of which that what you think your program should output is wrong. Below is the corrected program:

def NW(s1,s2,match = 1,mismatch = -1, gap = -2):
    penalty = {'MATCH': match, 'MISMATCH': mismatch, 'GAP': gap} #A dictionary for all the penalty valuse.
    n = len(s1) + 1 #The dimension of the matrix columns.
    m = len(s2) + 1 #The dimension of the matrix rows.
    al_mat = np.zeros((m,n),dtype = int) #Initializes the alighment matrix with zeros.
    p_mat = np.zeros((m,n),dtype = str) #Initializes the alighment matrix with zeros.
    #Scans all the first rows element in the matrix and fill it with "gap penalty"
    for i in range(m):
        al_mat[i][0] = penalty['GAP'] * i
        p_mat[i][0] = 'V'
    #Scans all the first columns element in the matrix and fill it with "gap penalty"
    for j in range (n):
        al_mat[0][j] = penalty['GAP'] * j
        p_mat [0][j] = 'H'
    #Fill the matrix with the correct values.

    p_mat [0][0] = 0 #Return the first element of the pointer matrix back to 0.
    for i in range(1,m):
        for j in range(1,n):
            di = al_mat[i-1][j-1] + Diagonal(s1[j-1],s2[i-1],penalty) #The value for match/mismatch -  diagonal.
            ho = al_mat[i][j-1] + penalty['GAP'] #The value for gap - horizontal.(from the left cell)
            ve = al_mat[i-1][j] + penalty['GAP'] #The value for gap - vertical.(from the upper cell)
            al_mat[i][j] = max(di,ho,ve) #Fill the matrix with the maximal value.(based on the python default maximum)
            p_mat[i][j] = Pointers(di,ho,ve)
    print np.matrix(al_mat)
    print np.matrix(p_mat)
  1. ho and ve were swapped
  2. di and so on were using the updated penalties if a given path were followed. Instead, you want to go to the greater of the neighboring points.
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Devon Ryan97k

...what you think your program should output is wrong.

Ooops, I didn't consider that :)

ADD REPLYlink written 3.9 years ago by Brian Bushnell17k

I didn't understand why my program should output is wrong. Isn't mismatch better than gap? And also according to ncbi that's realy the right answer:

Query  1  TCGCA  5
          || ||
Sbjct  1  TC-CA  4
ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by elisheva100

In and of itself, yes, but the neighboring scores represent the score of the best-possible path leading to that point...

ADD REPLYlink written 3.9 years ago by Devon Ryan97k

But what you'v changed does'nt work on this case. its output is:

[[  0  -2  -4  -6  -8 -10]
 [ -2   1  -1  -3  -5  -7]
 [ -4  -1   2   0  -2  -4]
 [ -6  -3   0   1   1  -1]
 [ -8  -5  -2  -1   0   2]]

[['0' 'H' 'H' 'H' 'H' 'H']
 ['V' 'D' 'H' 'H' 'H' 'H']
 ['V' 'V' 'D' 'H' 'H' 'H']
 ['V' 'V' 'V' 'D' 'H' 'H']
 ['V' 'V' 'V' 'V' 'D' 'D']]

And as I wrote on top the ncbi indeed gives my expected result:

Query  1  TCGCA  5
          || ||
Sbjct  1  TC-CA  4
ADD REPLYlink written 3.9 years ago by elisheva100

Oh, wait a second. It looks to me like your initial output was already correct and it's just your expected output that's wrong.

ADD REPLYlink written 3.9 years ago by Brian Bushnell17k

No, my output is wrong. 'v' means up and 'H' left

ADD REPLYlink written 3.9 years ago by elisheva100

Yep, that's correct. So:

         T   C   G   C   A
  [['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'V' 'V' 'V' 'V']
C ['V' 'H' 'D' 'V' 'D' 'V']
C ['V' 'H' 'D' 'D' 'D' 'V']
A ['V' 'H' 'H' 'D' 'D' 'D']]

That looks correct to me. A path to get to the bottom right could horizontally along the top for a while (through Hs) then down for a while (through Vs) before it gets to a D and finishes along the diagonal. That's what it's supposed to look like.

ADD REPLYlink written 3.9 years ago by Brian Bushnell17k

Indeed, ignore my point 2, it was just the swapping of ve and ho. That's what I get for not checking.

I've corrected the code in the answer.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Devon Ryan97k
0
gravatar for Brian Bushnell
3.9 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

Looks like you can change Pointers like this:

if(di == pointer):
    return 'D'
elif(ho == pointer):
    return 'V'
else:
     return 'H'
ADD COMMENTlink written 3.9 years ago by Brian Bushnell17k

It does'nt explain the problem

ADD REPLYlink written 3.9 years ago by elisheva100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 965 users visited in the last hour