Question: BLOSUM80 values differ - but what's the reason?
1
2.4 years ago by
RamRS17k
Houston, TX
RamRS17k wrote:

Hi all,

I've a simple but intriguing question: I was trying to pull the BLOSUM80 matrix online, and I see that the substitution scores differ among sources.

For example, the Ter scores on this matrix are -8 while the same change on this NCBI BLOSUM80 are -6

Is there a "right" matrix? What do these different scores indicate as an underlying factor? I'd appreciate the community's input on this please.

Thank you!

substitution matrix blosum • 1.1k views
modified 2.4 years ago by gearoid200 • written 2.4 years ago by RamRS17k
2

This may also be related to the miscalculation discovered in BLOSUM matrices a few years back. Surprisingly miscalculated matrix was found to be better for searches.

Sort of scary this observation, since it will eventually affect the results

2
2.4 years ago by
gearoid200
gearoid200 wrote:

It looks like the two versions are using different units. In the first matrix the unit is 1/3 bit, while in the second matrix, it's 1/2 bit.

From the second matrix:

a scale of ln(2)/2.0


is the conversion factor for 'nats' to half bit units.

good point,

the concerning thing here is that whereas the relative ratios (probabilities) will be the same the total scores for alignments will be different when using the two, at which point stating which BLOSUM80 matrix one used is required. Oh I used the BLOSUM80 with this scaling not the other ... as if there there weren't enough issues to deal with.

Frankly I never thought that you could download two BLOSUM80 matrices from NCBI and get very different data ... well now I know

ftp://ftp.ncbi.nlm.nih.gov/blast/matrices/BLOSUM80

http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/data/BLOSUM80

I did not realize both were available in NCBI!

I noticed that too, but I'm confused - I recall having to use log_base_2( P(Obs) / P(Exp) ) to get the scores. How would I convert these two units?

It is a constant scaling factor to the equation:

https://en.wikipedia.org/wiki/BLOSUM#Score_of_the_BLOSUM_matrices

The factor \lambda is a scaling factor, set such that the matrix contains easily computable integer values.