Question: How to extract the probability of replacing one amino acid by other form BLOSUM matrix?
1
shark_110 wrote:

Every element of BLOSUM matrix is computed by the formula (from Wikipedia):  is the probability of two amino acids and replacing each other in a homologous sequence, and and are the background probabilities of finding the amino acids and in any protein sequence. The factor is a scaling factor.

I would like to compute probability of replacing one amino acid by another . The BLOSUM matrix is implemented in biopython module, but unfortunately I have not found the probabilities and . Is there any easy way to obtain it or compute it?

biopython blosum • 2.7k views
modified 5.9 years ago by Hugues250 • written 5.9 years ago by shark_110
1
Hugues250 wrote:

The background probabilities are the probabilities of occurrence of each amino acids.

These are observed probabilities. You can gather a set of representative proteins for your particular organism and count how often they occur. Just to give an idea:

<caption>AA observed probabilities in vertebrae</caption>
 Alanine 7.4% Arginine 4.2% Aspargine 4.4% Aspartic Acid 5.9%

Update1: You will therefore get a different BLOSSUM matrix for each organism, but also if you are in interested in comparing proteins that have non-standard compositions (see this paper for instance yu2004)

About your question: If you really want, you could try to compute them. For example you know that there is only one way to code for Tryptophan (UGG) while there are three ways to code for Isoleucine (AUU, AUA, AUC). Therefore we could say that Trp is three times less likely than Ile. Knowing that DNA is 22.0% U, 30.3% A, 21.7% C, and 26.1% G, you could in principle compute in-silico the probabilities. Of course, Nature works differently and you should expect discrepancies with the observed probabilities. In particular for Arginine which does not follow those rules at all.

Ref