Question: How to extract the probability of replacing one amino acid by other form BLOSUM matrix?
1
gravatar for shark_1
4.9 years ago by
shark_110
Poland
shark_110 wrote:

Every element of BLOSUM matrix is computed by the formula (from Wikipedia):

S_{ij}= \left( \frac{1}{\lambda} \right)\log{\left( \frac{p_{ij}}{q_i * q_j} \right)}

 p_{ij} is the probability of two amino acids i and j replacing each other in a homologous sequence, and q_i and q_j are the background probabilities of finding the amino acids i and j in any protein sequence. The factor \lambda is a scaling factor.

I would like to compute probability of replacing one amino acid  i by another j. The BLOSUM matrix is implemented in biopython module, but unfortunately I have not found the probabilities  q_i and q_j . Is there any easy way to obtain it or compute it?

biopython blosum • 2.3k views
ADD COMMENTlink modified 4.9 years ago by Hugues250 • written 4.9 years ago by shark_110
1
gravatar for Hugues
4.9 years ago by
Hugues250
Oslo, Norway
Hugues250 wrote:

The background probabilities are the probabilities of occurrence of each amino acids.

These are observed probabilities. You can gather a set of representative proteins for your particular organism and count how often they occur. Just to give an idea:

<caption>AA observed probabilities in vertebrae</caption>
Alanine 7.4%
Arginine 4.2%
Aspargine 4.4%
Aspartic Acid 5.9%

 

Update1: You will therefore get a different BLOSSUM matrix for each organism, but also if you are in interested in comparing proteins that have non-standard compositions (see this paper for instance yu2004)

About your question: If you really want, you could try to compute them. For example you know that there is only one way to code for Tryptophan (UGG) while there are three ways to code for Isoleucine (AUU, AUA, AUC). Therefore we could say that Trp is three times less likely than Ile. Knowing that DNA is 22.0% U, 30.3% A, 21.7% C, and 26.1% G, you could in principle compute in-silico the probabilities. Of course, Nature works differently and you should expect discrepancies with the observed probabilities. In particular for Arginine which does not follow those rules at all.

Ref

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Hugues250

Please upvote and "accept" if this answered your question.

ADD REPLYlink written 4.9 years ago by Hugues250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 800 users visited in the last hour