Question

Can BLASTn be used to calculate sequence similarity?

0

Entering edit mode

4.0 years ago

jamie.pike ▴ 80

I'm a little confused by the use of similarity and identity and was wondering if someone could help set me straight.

I have recently read a paper in which the authors state that, following BLASTN, they extracted and used sequences with >90% similarity. However, I do not know how they calculated similarity using BLASTN. I am aware that you can set a threshold for identity using BLASTN with the argument -perc_identity 90.00; but so far as I understand, similarity and identity are not the same things. For instance here, sequence A and B = 100% identity but 60% similarity.

Do you think the authors actually mean similarity in this instance? If so, how do I calculate similarity using BLASTN?

Thanks

BLASTN Similarity identity • 1.1k views

ADD COMMENT • link updated 4.0 years ago by lieven.sterck 15k • written 4.0 years ago by jamie.pike ▴ 80

score 1 · Answer 1 · 2020-05-14

1

Entering edit mode

4.0 years ago

lieven.sterck 15k

It is sometimes confusing indeed and it depends on the setting/context

For a blastN analysis similarity equals identity (technically we don't use similarity for blastN results, only identity), as there are only 4 letters (bases) to work with and they are all 4 distinct of each other. (you can't say that some of those bases are more or less similar to one other, it's a match or it's not a match)

For BlastP (or all protein blasts) there is a difference as there are 20 (21) letters in that alphabet and some of those are similar to each other but not identical (eg. leucine and isoleucine are similar but not identical) so for all protein related blast their is a fundamental difference between similarity and identity.

For instance here, sequence A and B = 100% identity but 60% similarity.

Sorry, that is simply wrong.

ADD COMMENT • link 4.0 years ago by lieven.sterck 15k

0

Entering edit mode

Ah, that makes much more sense. Thank you! So similarity will only ever be greater than identity when it comes to BlastP, and is the same for BlastN. Is there another factor that will take into account the length of the alignment? Say A is longer than B, but B is identical for x many bases? e.g. A: AAGGCTT B: AAGGC

ADD REPLY • link 4.0 years ago by jamie.pike ▴ 80

0

Entering edit mode

that's correct indeed

Well, there is a parameter that provides the query coverage % ( will be ~75% in your example).

Since it's blastn (and only for blastN), you can also use the bitscore as a kind of proxy for the alignment length (the scoring of a nucleotide alignment is quite linear ).

ADD REPLY • link 4.0 years ago by lieven.sterck 15k