Question: PAM matrix calculation
0

Hi everyone,

I've been working on a personal project in PHP, namely an alignment algoritm using Needlemann-Wunsh. I've added some fixed PAM and BLOSSUM matrices however I wanted to add the option to use a variable PAM matrix. For instance PAM65 would be (PAM1)^65. After implementing everything in PHP and running it, I get the correct PAM250 matrix when compared to the official PAM250 matrix of NCBI (some rounding errors probably). However when testing the PAM30 and comparing that one I get totally different results.

My matrix multiplication code looks like this:

``````function matrixmult(\$m1,\$m2){
\$r=count(\$m1);
\$c=count(\$m2);
\$p=count(\$m2);
if(count(\$m1)!=\$p){throw new Exception('Incompatible matrixes');}
\$m3=array();
for (\$i=0;\$i< \$r;\$i++){
for(\$j=0;\$j<\$c;\$j++){
\$m3[\$i][\$j]=0;
for(\$k=0;\$k<\$p;\$k++){
\$m3[\$i][\$j]+=\$m1[\$i][\$k]*\$m2[\$k][\$j];
}
}
}

for (\$i = 0; \$i < \$r; \$i++) {
\$sum=0;
for (\$j = 0; \$j < \$c; \$j++) {
\$sum+=\$m3[\$j][\$i];
}
\$sum-=\$m3[\$i][\$i];
echo \$sum;
\$m3[\$i][\$i]=1-\$sum;
}
return(\$m3);
}
``````

This would then be run N-1 times with m1 being PAM(i) and m2 being PAM1.

The second part of the function makes sure that the sum of each column = 1

Next I would calculate the Log-odds scoring matrix as following:

``````foreach (\$PAMN2 as \$key => \$line) {
foreach (\$line as \$key2 => \$word) {
\$scomat[\$key][\$key2]=round(10*log((\$word/(\$rm[\$key])),10));
}
}
``````

With the keys being amino acid 1 letter codes and rm being the relative frequencies of the amino acids.

I did not really understand how the log odds are symmetric so I just copied the lower half of the scoring matrix to the upper half:

``````\$allkeys = array_keys(\$scomat);
for (\$i = 0; \$i < 20; \$i++) {
for (\$j = 0; \$j < \$i; \$j++) {
\$scomat[\$allkeys[\$j]][\$allkeys[\$i]]=\$scomat[\$allkeys[\$i]][\$allkeys[\$j]];
}
}
``````

I wondered if anyone knows why the scoring matrix is wrong? The PAM1 that I use is the standard one as well as the amino acid frequencies.

Best regards

Cedric