How percentage of identity is calculated ?
1
0
Entering edit mode
2.3 years ago

Hello all,

I'm using Plastx (some faster equivalent of Blastx) to compare my 500k contigs to ncbi database in order to check some suspected unknown contamination. I would like to only select the results with a percentage of identity > 95%. However there is no such "percentage identity colum" in the output file.

I have some others info about HSP_identity, intensity and alignment. So I was wondering if it is possible to calcultate this percentage myself.

Thanks !

alignment • 683 views
0
Entering edit mode
2.3 years ago
Mensur Dlakic ★ 22k

I have never used the program you refer to, so my answer is based only on the information you provided.

HSP_identity is likely what you want. HSP usually stands for high-scoring segment pairs, which implies that Plastx is a local aligner like BLAST. If so, HSP identity may refer only to fragments of the whole alignment, in which case it may not be equivalent to global identity. Think of it this way: if there is a single HSP per sequence, it is probably safe to assume that HSP identity is the same as overall (global) identity. If there are multiple HSPs per sequence, that would require a second look.

To make your life easier (although the search will likely be slower), you may want to consider using a global aligner. That way the identity obtained will reflect the whole alignment.

0
Entering edit mode

Thanks for your help ! I'm a bit confused since I have specified -maxhsps 1 in my command and somehow I sometimes get result >100 in the column HSP_identity, more especialy when the HSP e-value is low.

EDIT : I found the answer of my question : on plast, percentage identity = HSP_identity / HSP_align_length. It's sometimes slightly different from blast results due to the calculation of the alignment length that can vary up to 2bp.