4.2 years ago by
Fort Collins, CO, USA
Before we go into your question, it may be best and most concise to simply describe the exact SNP sites and leave it at that, given that your proteins are so similar. However, here are the differences in PAM and BLOSUM:
BLOSUM (BLOcks SUbstitution Matrix) were derived by looking at alignments of highly conserved protein domains at different evoluntionarily divergent distances, then taking into account how frequently one amino acid was substituted to another. It's described in this paper by Henikoff. They are based on local alignment of conserved protein regions.
PAM (Point Accepted Mutations) matrices were first described by Margaret Dayhoff (who was a fantastic scientist, even in face of the challenges of her role given the time period). "Each entry in a PAM matrix indicates the likelihood of the amino acid of that row being replaced with the amino acid of that column through a series of one or more point accepted mutations during a specified evolutionary interval, rather than these two amino acids being aligned due to chance." They are based on global alignment.
In short, this is what matters about the differences between the two:
- PAM matrices are typically used on more closely related proteins (such as your case), BLOSUM are typically used on more evolutionarily divergent proteins.
- The greater the PAM number the more DISTANT the sequences being compared should be; the greater the BLOSUM number, the more SIMILAR the sequences being compared should be.
So for your application, if you were to use these, you should either use a LOW PAM matrix or a HIGH BLOSUM matrix number. Whether this is appropriate for your application depends on what you want to get out of it (e.g. the whole protein difference or just local protein domain differences); you're right in that they are typically used for alignment scoring, but they can also be used to generate some evolutionary cost distance. However, there may be better methods out there for your purpose if you look for methods for creating distance trees based on some metric.