Can BLASTn be used to calculate sequence similarity?
1
0
Entering edit mode
4.0 years ago
jamie.pike ▴ 80

I'm a little confused by the use of similarity and identity and was wondering if someone could help set me straight.

I have recently read a paper in which the authors state that, following BLASTN, they extracted and used sequences with >90% similarity. However, I do not know how they calculated similarity using BLASTN. I am aware that you can set a threshold for identity using BLASTN with the argument -perc_identity 90.00; but so far as I understand, similarity and identity are not the same things. For instance here, sequence A and B = 100% identity but 60% similarity.

Do you think the authors actually mean similarity in this instance? If so, how do I calculate similarity using BLASTN?

Thanks

BLASTN Similarity identity • 1.1k views
ADD COMMENT
1
Entering edit mode
4.0 years ago

It is sometimes confusing indeed and it depends on the setting/context

For a blastN analysis similarity equals identity (technically we don't use similarity for blastN results, only identity), as there are only 4 letters (bases) to work with and they are all 4 distinct of each other. (you can't say that some of those bases are more or less similar to one other, it's a match or it's not a match)

For BlastP (or all protein blasts) there is a difference as there are 20 (21) letters in that alphabet and some of those are similar to each other but not identical (eg. leucine and isoleucine are similar but not identical) so for all protein related blast their is a fundamental difference between similarity and identity.

For instance here, sequence A and B = 100% identity but 60% similarity.

Sorry, that is simply wrong.

ADD COMMENT
0
Entering edit mode

Ah, that makes much more sense. Thank you! So similarity will only ever be greater than identity when it comes to BlastP, and is the same for BlastN. Is there another factor that will take into account the length of the alignment? Say A is longer than B, but B is identical for x many bases? e.g. A: AAGGCTT B: AAGGC

ADD REPLY
0
Entering edit mode

that's correct indeed

Well, there is a parameter that provides the query coverage % ( will be ~75% in your example).

Since it's blastn (and only for blastN), you can also use the bitscore as a kind of proxy for the alignment length (the scoring of a nucleotide alignment is quite linear ).

ADD REPLY

Login before adding your answer.

Traffic: 2648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6