Question

identity, positivity and similarity in blastp results

0

Entering edit mode

5.7 years ago

utsafar ▴ 80

Using blastp, the ppos (percentage of positive scoring matches) is:

(number of identical matches)+(number of similar matches)/(alignment length)

But I think that in ppos the importance of identity is low estimated. I looked in BLOSUM62 table and see that the score of identical matches is between 4 to 11 and the score of similar matches is 1 or 2. Then I made this equation:

Similar matches score (smscore) = (number of identical matches)+(1/5 number of similar matches)/(alignment length)

for example if in a 100 AA match, number of identity is 45 and number of similarity is 15 then, ppos is 60% and smscore is 48

I see that it is an score between percentage of identity and percentage of positives. Am I correct? Do I lead my self into a better understanding of similarity of two sequences or mislead my self?

Does blastp itself has any score like my smscore?

blastp blast similarity • 4.8k views

ADD COMMENT • link 5.7 years ago by utsafar ▴ 80

0

Entering edit mode

See this post, the book from Damian advice looks helpful.

Blast raw score calculation

ADD REPLY • link 5.7 years ago by natasha.sernova ★ 4.0k

score 0 · Answer 1 · 2018-08-06

0

Entering edit mode

5.7 years ago

lieven.sterck 15k

I think you're misleading yourself ;)

So if two proteins are 100% similar, but no identical matches you will get a very low score in your reasoning, while this could be a very relevant biological match (== is exactly what we are looking for ).

yes, you can request the pident in your blast output, which will give the percent of identical matches rather than all the positives

ADD COMMENT • link 5.7 years ago by lieven.sterck 15k

0

Entering edit mode

@lieven.sterck Thank you but I think

1- Based on your answer, pident is even more misleading, because it ignores similarity at all.

2- If you align two homolog proteins, number of identical matches are often (much) more than similar matches. So having two proteins with 100% similarity without identity (or some thing near this) is actually impossible.

ADD REPLY • link 5.7 years ago by utsafar ▴ 80

0

Entering edit mode

1- yes, that is true, I thought you were looking for a 'more easily' interpretable measure .

2- I would certainly not take that as truth , might have been your observation but you would be surprised how often it might happen (ok, the nothing to everything will be an exceptional case). On DNA level it's all about identity, on protein level it's all about similarity!

if you are looking for a measure that takes everything into account ( somewhat "weighted") then why not go for the bitscore, that is exactly what it's there for?

ADD REPLY • link 5.7 years ago by lieven.sterck 15k