Traffic: 303 ip/hr
Question: Implementation of Blosum62 in the source code of global pairwise alignment of proteins

Hi,

I am trying to implement protein pairwise sequence alignment using "Global Alignment" algorithm by 'Needleman -Wunsch'. I am using VB.NET.

I am not clear about how to include 'Blosum62 Matrix' in my source code to do the scoring or to fill the two-dimensional matrix?

I have googled and found that most people suggested to use flat file which contains the standard 'Blosum62 Matrix'. Does it mean that I need to read from this flat file and fill my coded "Blosum62 Martrix' ?

Also, the other approach could be is to use some mathematical formula and include it in your programming logic to construct 'Blosum62 Matrix'. But not very sure about this option.

Any ideas or insights are appreciated.

Also, is there any pesudo algorithm to do the protein pairwise alignment using Global available? I tired to find the basic steps of the alogrithm online but no luck so I am planning to do the same steps as I did for the global pairwise alignment of Nucleotides

Thanks.

There are no mathematical formulas for this.

What you need is a data structure that you can use to retrieve the score for substitutions that you observe. It could be as simple as as hash map. For example in Python you could initialize it like so:

``````blosum = dict()
blosum['Ala'] = dict()
blosum['Ala']['Ala'] = 4
blosum['Ala']['Arg'] = -1
blosum['Ala']['Asn'] = -2
... etc ...
``````

Of course you would not need to initialize it by hand, the information should be read from a file, that way you can load different scoring matrices. Later during alignment when you observe an Ala -> Arg substitution you could retrieve the value as:

``````blosum['Ala']['Arg']
``````

Use the corresponding data structure from your programming language to build the same construct.

Thanks Istvan for your suggestion. I will work on the same lines.

My answer comes late but I just discovered this web site.

Instead of writing the blosum matrix in a data struct, I think it is a better idea to create a function to read your matrix in a text file.

Thus, if you want to try another scoring matrix than blosum 62, you just have to read another file.

Bilou.

The Needleman-Wunsch algorithm is a simple dynamic programming approach. Perhaps this page can helps you with pseudo-code : http://en.wikipedia.org/wiki/Dynamic_programming

sorry, I'm only speaking 'java' here.

I would create an interface ScoreMatrix:

``````public interface ScoreMatrix
{
public int getScore(char aa1,char aa2);
}
``````

that would be used by your AlignmentTool

``````public interface AlignmentTool
{
public void setScoreMatrix(ScoreMatrix m);
public ScoreMatrix getScoreMatrix();
public void align(String seq1,String seq);
(...)
}
``````

and Blosum62 would be an implementation of ScoreMatrix

``````public class Blosum62 implements ScoreMatrix
{
public int getScore(char aa1,char aa2)
{
switch(upper(aa1))
{
(...)
{
case 'A' :
switch(upper(aa2))
{
(...)
case 'A': return 98;
(...)
}
}
}
}
``````