PAM matrix calculation
0
0
Entering edit mode
5.4 years ago

Hi everyone,

I've been working on a personal project in PHP, namely an alignment algoritm using Needlemann-Wunsh. I've added some fixed PAM and BLOSSUM matrices however I wanted to add the option to use a variable PAM matrix. For instance PAM65 would be (PAM1)^65. After implementing everything in PHP and running it, I get the correct PAM250 matrix when compared to the official PAM250 matrix of NCBI (some rounding errors probably). However when testing the PAM30 and comparing that one I get totally different results.

My matrix multiplication code looks like this:

function matrixmult($m1,$m2){
      $r=count($m1);
      $c=count($m2[0]);
      $p=count($m2);
      if(count($m1[0])!=$p){throw new Exception('Incompatible matrixes');}
      $m3=array();
      for ($i=0;$i< $r;$i++){
          for($j=0;$j<$c;$j++){
              $m3[$i][$j]=0;
              for($k=0;$k<$p;$k++){
                  $m3[$i][$j]+=$m1[$i][$k]*$m2[$k][$j];
              }
          }
      }

      for ($i = 0; $i < $r; $i++) {
         $sum=0;
         for ($j = 0; $j < $c; $j++) {
            $sum+=$m3[$j][$i];
         }
         $sum-=$m3[$i][$i];
         echo $sum;
         $m3[$i][$i]=1-$sum;
      }
      return($m3);
  }

This would then be run N-1 times with m1 being PAM(i) and m2 being PAM1.

The second part of the function makes sure that the sum of each column = 1

Next I would calculate the Log-odds scoring matrix as following:

foreach ($PAMN2 as $key => $line) {
        foreach ($line as $key2 => $word) {
           $scomat[$key][$key2]=round(10*log(($word/($rm[$key])),10));
        }
  }

With the keys being amino acid 1 letter codes and rm being the relative frequencies of the amino acids.

I did not really understand how the log odds are symmetric so I just copied the lower half of the scoring matrix to the upper half:

$allkeys = array_keys($scomat);
  for ($i = 0; $i < 20; $i++) {
     for ($j = 0; $j < $i; $j++) {
        $scomat[$allkeys[$j]][$allkeys[$i]]=$scomat[$allkeys[$i]][$allkeys[$j]];
     }
  }

I wondered if anyone knows why the scoring matrix is wrong? The PAM1 that I use is the standard one as well as the amino acid frequencies.

Best regards

Cedric

PAM Needlemann-Wunsh alignment • 2.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 2806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6