Question: identity, positivity and similarity in blastp results
9 months ago
utsafar20 wrote:

Using blastp, the ppos (percentage of positive scoring matches) is:

(number of identical matches)+(number of similar matches)/(alignment length)

But I think that in ppos the importance of identity is low estimated. I looked in BLOSUM62 table and see that the score of identical matches is between 4 to 11 and the score of similar matches is 1 or 2. Then I made this equation:

Similar matches score (smscore) = (number of identical matches)+(1/5 number of similar matches)/(alignment length)

for example if in a 100 AA match, number of identity is 45 and number of similarity is 15 then, ppos is 60% and smscore is 48

I see that it is an score between percentage of identity and percentage of positives. Am I correct? Do I lead my self into a better understanding of similarity of two sequences or mislead my self?

Does blastp itself has any score like my smscore?

blast blastp similarity
9 months ago by utsafar20

9 months ago
VIB, Ghent, Belgium
lieven.sterck4.8k wrote:

I think you're misleading yourself ;)

So if two proteins are 100% similar, but no identical matches you will get a very low score in your reasoning, while this could be a very relevant biological match (== is exactly what we are looking for ).

yes, you can request the pident in your blast output, which will give the percent of identical matches rather than all the positives

modified 9 months ago • written 9 months ago by lieven.sterck4.8k

@lieven.sterck Thank you but I think

1- Based on your answer, pident is even more misleading, because it ignores similarity at all.

2- If you align two homolog proteins, number of identical matches are often (much) more than similar matches. So having two proteins with 100% similarity without identity (or some thing near this) is actually impossible.

modified 9 months ago • written 9 months ago by utsafar20

1- yes, that is true, I thought you were looking for a 'more easily' interpretable measure .

2- I would certainly not take that as truth , might have been your observation but you would be surprised how often it might happen (ok, the nothing to everything will be an exceptional case). On DNA level it's all about identity, on protein level it's all about similarity!

if you are looking for a measure that takes everything into account ( somewhat "weighted") then why not go for the bitscore, that is exactly what it's there for?

modified 9 months ago • written 9 months ago by lieven.sterck4.8k
