Question

needleman-wunsch algorithm on python

2

Entering edit mode

7.3 years ago

elisheva ▴ 120

hello everyone.

I have to execute the needleman-wunsch algorithm on python for global sequence alignment.

I have to fill 1 matrix withe all the values according to the penalty of match, mismatch, and gap.

And another matrix as pointers matrix - where "v" for vertical, "H" for horizontal and "D" for diagonal.

for example: for the sequences: TCGCA and TCCA, match:1, mismatch:-1 and gap:-2

the output should be:

           T   C   G   C   A
   [[  0  -2  -4  -6  -8 -10]
T  [ -2   1  -1  -3  -5  -7]
C  [ -4  -1   2   0  -2  -4]
C  [ -6  -3   0   1   1  -1]
A  [ -8  -5  -2  -1   0   2]]

         T   C   G   C   A
  [['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'H' 'H' 'H' 'H']
C ['V' 'V' 'D' 'H' 'D' 'H']
C ['V' 'V' 'D' 'D' 'D' 'H']
A ['V' 'V' 'V' 'D' 'D' 'D']]

And my program output is:

           T   C   G   C   A
   [[  0  -2  -4  -6  -8 -10]
T  [ -2   1  -1  -3  -5  -7]
C  [ -4  -1   2   0  -2  -4]
C  [ -6  -3   0   1   1  -1]
A  [ -8  -5  -2  -1   0   2]]

         T   C   G   C   A
  [['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'V' 'V' 'V' 'V']
C ['V' 'H' 'D' 'V' 'D' 'V']
C ['V' 'H' 'D' 'D' 'D' 'V']
A ['V' 'H' 'H' 'D' 'D' 'D']]

This is my code:

import numpy as np
from string import *

#-------------------------------------------------------
#This function returns to values for cae of match or mismatch
def Diagonal(n1,n2,pt):
    if(n1 == n2):
        return pt['MATCH']
    else:
        return pt['MISMATCH']

#------------------------------------------------------------   
#This function gets the optional elements of the aligment matrix and returns the elements for the pointers matrix.
def Pointers(di,ho,ve):

    pointer = max(di,ho,ve) #based on python default maximum(return the first element).

    if(di == pointer):
        return 'D'
    elif(ho == pointer):
        return 'H'
    else:
         return 'V'    

#--------------------------------------------------------
#This function creates the aligment and pointers matrices
def NW(s1,s2,match = 1,mismatch = -1, gap = -2):
    penalty = {'MATCH': match, 'MISMATCH': mismatch, 'GAP': gap} #A dictionary for all the penalty valuse.
    n = len(s1) + 1 #The dimension of the matrix columns.
    m = len(s2) + 1 #The dimension of the matrix rows.
    al_mat = np.zeros((m,n),dtype = int) #Initializes the alighment matrix with zeros.
    p_mat = np.zeros((m,n),dtype = str) #Initializes the alighment matrix with zeros.
    #Scans all the first rows element in the matrix and fill it with "gap penalty"
    for i in range(m):
        al_mat[i][0] = penalty['GAP'] * i
        p_mat[i][0] = 'V'
    #Scans all the first columns element in the matrix and fill it with "gap penalty"
    for j in range (n):
        al_mat[0][j] = penalty['GAP'] * j
        p_mat [0][j] = 'H'
    #Fill the matrix with the correct values.

    p_mat [0][0] = 0 #Return the first element of the pointer matrix back to 0.
    for i in range(1,m):
        for j in range(1,n):
            di = al_mat[i-1][j-1] + Diagonal(s1[j-1],s2[i-1],penalty) #The value for match/mismatch -  diagonal.
            ho = al_mat[i-1][j] + penalty['GAP'] #The value for gap - horizontal.(from the left cell)
            ve = al_mat[i][j-1] + penalty['GAP'] #The value for gap - vertical.(from the upper cell)
            al_mat[i][j] = max(di,ho,ve) #Fill the matrix with the maximal value.(based on the python default maximum)
            p_mat[i][j] = Pointers(di,ho,ve)
    print np.matrix(al_mat) 
    print np.matrix(p_mat)

Can anybody find what the problem is?

Thanks!!

python • 31k views

ADD COMMENT • link 7.3 years ago by elisheva ▴ 120

0

Entering edit mode

7.3 years ago

Brian Bushnell 20k

Looks like you can change Pointers like this:

if(di == pointer):
    return 'D'
elif(ho == pointer):
    return 'V'
else:
     return 'H'

ADD COMMENT • link 7.3 years ago by Brian Bushnell 20k

0

Entering edit mode

It does'nt explain the problem

ADD REPLY • link 7.3 years ago by elisheva ▴ 120

score 6 · Accepted Answer · 2017-01-12

6

Entering edit mode

7.3 years ago

Devon Ryan 104k

There are a couple problems, not least of which that what you think your program should output is wrong. Below is the corrected program:

def NW(s1,s2,match = 1,mismatch = -1, gap = -2):
    penalty = {'MATCH': match, 'MISMATCH': mismatch, 'GAP': gap} #A dictionary for all the penalty valuse.
    n = len(s1) + 1 #The dimension of the matrix columns.
    m = len(s2) + 1 #The dimension of the matrix rows.
    al_mat = np.zeros((m,n),dtype = int) #Initializes the alighment matrix with zeros.
    p_mat = np.zeros((m,n),dtype = str) #Initializes the alighment matrix with zeros.
    #Scans all the first rows element in the matrix and fill it with "gap penalty"
    for i in range(m):
        al_mat[i][0] = penalty['GAP'] * i
        p_mat[i][0] = 'V'
    #Scans all the first columns element in the matrix and fill it with "gap penalty"
    for j in range (n):
        al_mat[0][j] = penalty['GAP'] * j
        p_mat [0][j] = 'H'
    #Fill the matrix with the correct values.

    p_mat [0][0] = 0 #Return the first element of the pointer matrix back to 0.
    for i in range(1,m):
        for j in range(1,n):
            di = al_mat[i-1][j-1] + Diagonal(s1[j-1],s2[i-1],penalty) #The value for match/mismatch -  diagonal.
            ho = al_mat[i][j-1] + penalty['GAP'] #The value for gap - horizontal.(from the left cell)
            ve = al_mat[i-1][j] + penalty['GAP'] #The value for gap - vertical.(from the upper cell)
            al_mat[i][j] = max(di,ho,ve) #Fill the matrix with the maximal value.(based on the python default maximum)
            p_mat[i][j] = Pointers(di,ho,ve)
    print np.matrix(al_mat)
    print np.matrix(p_mat)

ho and ve were swapped
~~di and so on were using the updated penalties if a given path were followed. Instead, you want to go to the greater of the neighboring points.~~

ADD COMMENT • link 7.3 years ago by Devon Ryan 104k

0

Entering edit mode

...what you think your program should output is wrong.

Ooops, I didn't consider that :)

ADD REPLY • link 7.3 years ago by Brian Bushnell 20k

0

Entering edit mode

I didn't understand why my program should output is wrong. Isn't mismatch better than gap? And also according to ncbi that's realy the right answer:

Query  1  TCGCA  5
          || ||
Sbjct  1  TC-CA  4

ADD REPLY • link 7.3 years ago by elisheva ▴ 120

0

Entering edit mode

In and of itself, yes, but the neighboring scores represent the score of the best-possible path leading to that point...

ADD REPLY • link 7.3 years ago by Devon Ryan 104k

0

Entering edit mode

But what you'v changed does'nt work on this case. its output is:

[[  0  -2  -4  -6  -8 -10]
 [ -2   1  -1  -3  -5  -7]
 [ -4  -1   2   0  -2  -4]
 [ -6  -3   0   1   1  -1]
 [ -8  -5  -2  -1   0   2]]

[['0' 'H' 'H' 'H' 'H' 'H']
 ['V' 'D' 'H' 'H' 'H' 'H']
 ['V' 'V' 'D' 'H' 'H' 'H']
 ['V' 'V' 'V' 'D' 'H' 'H']
 ['V' 'V' 'V' 'V' 'D' 'D']]

And as I wrote on top the ncbi indeed gives my expected result:

Query  1  TCGCA  5
          || ||
Sbjct  1  TC-CA  4

ADD REPLY • link 7.3 years ago by elisheva ▴ 120

0

Entering edit mode

Oh, wait a second. It looks to me like your initial output was already correct and it's just your expected output that's wrong.

ADD REPLY • link 7.3 years ago by Brian Bushnell 20k

0

Entering edit mode

No, my output is wrong. 'v' means up and 'H' left

ADD REPLY • link 7.3 years ago by elisheva ▴ 120

0

Entering edit mode

Yep, that's correct. So:

         T   C   G   C   A
  [['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'V' 'V' 'V' 'V']
C ['V' 'H' 'D' 'V' 'D' 'V']
C ['V' 'H' 'D' 'D' 'D' 'V']
A ['V' 'H' 'H' 'D' 'D' 'D']]

That looks correct to me. A path to get to the bottom right could horizontally along the top for a while (through Hs) then down for a while (through Vs) before it gets to a D and finishes along the diagonal. That's what it's supposed to look like.

ADD REPLY • link 7.3 years ago by Brian Bushnell 20k

0

Entering edit mode

Indeed, ignore my point 2, it was just the swapping of ve and ho. That's what I get for not checking.

I've corrected the code in the answer.

ADD REPLY • link 7.3 years ago by Devon Ryan 104k