I got puzzled by some inconsistencies between published equations for calculating log-odds scores for substitutions matrices such as Blosum62.
In the seminal paper deriving the Blosum62 matrix the odds ratio used to calculate the log-odds score is: qij/eij where qij is observed probability of occurrence of the ij pair, and eij is expected probability of occurrence of the ij pair. According to this work, eij = PiPj for i = j and PiPj + PjPi = 2PiPj. This is derived from basic theory of probability.
In many other works, however, starting I think from Altschul 1991 the equation used for odds ratio is slightly different: qij/PiPj, that is the 2 in the denominator disappeared. This new equation has been derived from the statistical framework that Karlin and Altschul have developed.
What puzzles me that although both equations are similar, the Altschul ratio is exactly twice the value of Henikoff's ratio.
I know that log-odds are rescaled before putting to the substitution table by a factor of 1/lambda, but missing log(1/2) is not rescaling, it means that the final rescaled log-odds value from Altschul is bigger than Henikoff's by x/lambda (x depends on the base of the logarithm, which does not matter and is different in different works).
And this is the Altschul ratio that is more widely used in the following years up to now. The Henikoff's equation is, however, still used in some of the papers that derive purpose-specific substitution tables (e.g. in 'Structure-derived substitution matrices for alignment of distantly related sequences' and 'AN AMINO ACID SUBSTITUTION MATRIX FOR PROTEIN CONFORMATION IDENTIFICATION').
What's even more puzzling, that several articles claim that Blosum62 has been derived from Altschul-like odds ratio, forgetting the "2" in the denominator:
Is everything OK here? Do the ratios mean the same thing? Isn't it wrong saying that Blosum62 has been derived from Altschul-like ratio?