Question: identity, positivity and similarity in blastp results
gravatar for utsafar
21 months ago by
utsafar40 wrote:

Using blastp, the ppos (percentage of positive scoring matches) is:

(number of identical matches)+(number of similar matches)/(alignment length)

But I think that in ppos the importance of identity is low estimated. I looked in BLOSUM62 table and see that the score of identical matches is between 4 to 11 and the score of similar matches is 1 or 2. Then I made this equation:

Similar matches score (smscore) = (number of identical matches)+(1/5 number of similar matches)/(alignment length)

for example if in a 100 AA match, number of identity is 45 and number of similarity is 15 then, ppos is 60% and smscore is 48

I see that it is an score between percentage of identity and percentage of positives. Am I correct? Do I lead my self into a better understanding of similarity of two sequences or mislead my self?

Does blastp itself has any score like my smscore?

blast blastp similarity • 1.2k views
ADD COMMENTlink written 21 months ago by utsafar40

See this post, the book from Damian advice looks helpful.

Blast raw score calculation

ADD REPLYlink written 21 months ago by natasha.sernova3.7k
gravatar for lieven.sterck
21 months ago by
VIB, Ghent, Belgium
lieven.sterck7.8k wrote:

I think you're misleading yourself ;)

So if two proteins are 100% similar, but no identical matches you will get a very low score in your reasoning, while this could be a very relevant biological match (== is exactly what we are looking for ).

yes, you can request the pident in your blast output, which will give the percent of identical matches rather than all the positives

ADD COMMENTlink modified 21 months ago • written 21 months ago by lieven.sterck7.8k

@lieven.sterck Thank you but I think

1- Based on your answer, pident is even more misleading, because it ignores similarity at all.

2- If you align two homolog proteins, number of identical matches are often (much) more than similar matches. So having two proteins with 100% similarity without identity (or some thing near this) is actually impossible.

ADD REPLYlink modified 21 months ago • written 21 months ago by utsafar40

1- yes, that is true, I thought you were looking for a 'more easily' interpretable measure .

2- I would certainly not take that as truth , might have been your observation but you would be surprised how often it might happen (ok, the nothing to everything will be an exceptional case). On DNA level it's all about identity, on protein level it's all about similarity!

if you are looking for a measure that takes everything into account ( somewhat "weighted") then why not go for the bitscore, that is exactly what it's there for?

ADD REPLYlink modified 21 months ago • written 21 months ago by lieven.sterck7.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1948 users visited in the last hour