Hi all,
I wrote a small prototype for coding PAM matrices. The generated PAM matrices are OK compared with other sources. However, I have troubles when deriving the score matrix out of the PAM matrix. Indeed, depending on the order (e.g. PAM10, PAM250) I have to select different bases for the logarithms to make my score matrix suit with reference ones found for instance on ncbi ftp site (ftp://ftp.ncbi.nih.gov/blast/matrices).
Indeed:
- for PAM10: S10 = 2*log2(PAM10/f)
- for PAM250: S250 = 10.0*log10(PAM250/f)
where f are the amino acids normalized frequencies. I feel quite puzzled with such (apparently) formula inconsistency. I probably misunderstood something. I can not find anything about this in the Dayoff seminal paper. Would you have any idea about what I am doing wrong ? Thanks
Why would you not expect different results from these (they are different algorithms)?
thx for the reply. The ncbi implementation (see link above) provides log-odds with different logarithmic scales. That puzzles me because the standard formula for computing the score matrix is unique i.e. the log of the PAM matrix divided by the amino-acid frequencies. Whatever the logarithmic base used for computing that formula, I would expect this base to be constant regardless the PAM matrix number (2, 10, 50, 250 ...). Otherwise how to compare sequence alignment performed with PAM10, PAM 50 or PAM250 ?