Question

log-odds score from PAM matrix

0

Entering edit mode

6.5 years ago

ericpellegrini76 • 0

Hi all,

I wrote a small prototype for coding PAM matrices. The generated PAM matrices are OK compared with other sources. However, I have troubles when deriving the score matrix out of the PAM matrix. Indeed, depending on the order (e.g. PAM10, PAM250) I have to select different bases for the logarithms to make my score matrix suit with reference ones found for instance on ncbi ftp site (ftp://ftp.ncbi.nih.gov/blast/matrices).

Indeed:

for PAM10: S10 = 2*log2(PAM10/f)
for PAM250: S250 = 10.0*log10(PAM250/f)

where f are the amino acids normalized frequencies. I feel quite puzzled with such (apparently) formula inconsistency. I probably misunderstood something. I can not find anything about this in the Dayoff seminal paper. Would you have any idea about what I am doing wrong ? Thanks

alignment • 2.9k views

ADD COMMENT • link 6.5 years ago by ericpellegrini76 • 0

0

Entering edit mode

Why would you not expect different results from these (they are different algorithms)?

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

thx for the reply. The ncbi implementation (see link above) provides log-odds with different logarithmic scales. That puzzles me because the standard formula for computing the score matrix is unique i.e. the log of the PAM matrix divided by the amino-acid frequencies. Whatever the logarithmic base used for computing that formula, I would expect this base to be constant regardless the PAM matrix number (2, 10, 50, 250 ...). Otherwise how to compare sequence alignment performed with PAM10, PAM 50 or PAM250 ?

ADD REPLY • link 6.5 years ago by ericpellegrini76 • 0