BLAST definition and difference between 'qcovs' and 'qcovhsp'
1
13
Entering edit mode
9.4 years ago
rwn ▴ 590

Hello,

Can anyone provide a clear definition of what exactly the BLAST+ output parameters 'qcovs' and 'qcovhsp' are? From the help pages:

"qcovs means Query Coverage Per Subject"

"qcovhsp means Query Coverage Per HSP"

But beyond this I'm less sure. My interpretation of this is that qcovs is the query coverage summed over all potential HSPs, but does this simply sum over HSPs or does it account for the possibility of overlap among HSP alignments? I guess the first case could result in coverages of > 100%, whereas the latter would be more appropriate (and more useful). My interpretation of qcovhsp is that this returns a coverage value that is specific to each individual HSP, but are these returned one per line, or are they delimited in some way?

I've Googled around and looked up the BLAST documentation but couldn't find much, apologies if I've missed something obvious.

Cheers.

blast qcovs qcovhsp query coverage • 18k views
ADD COMMENT
5
Entering edit mode
9.4 years ago
Siva ★ 1.9k

The only documentation I could find was from an old NCBI newsletter (2006/7) in which it states that the "Query Coverage" is calculated the same way as "Total Score".

See at the very end of the following page:

http://www.ncbi.nlm.nih.gov/Web/Newsltr/V15N2/BLView.html

Edit: I edited my post since it is not clear from the linked page whether overlap is taken in to account .

ADD COMMENT
1
Entering edit mode

Hi Siva,

Thanks for the response. I think that documentation is a little old for these functions - according to this blog page, the qcovs and qcovhsp options were added in 2.2.28 sometime around 2012/13. Also, I've anecdotally noticed from my own results that qcovs never seems to exceed 100%, so I guess it does account for overlap among HSPs. I'm just amazed that there is not more precise info on what these params are and/or how they are calculated!

ADD REPLY
3
Entering edit mode

Your assumption seems to be correct (overlap information is taken in to account). Digging in to the BLAST source code, I stumbled on this part where many parameters are defined. Check line no: 110 in the following page: http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/include/objects/seqalign/Seq_align.hpp#L54

From what I understand, 'pct_coverage' is the 'qcovs'. It is the percent of no. of bases in the query sequence aligned with the subject sequence (match or mismatch). The bases can be in one HSP or several HSPs (overlap) but they are counted only once. Gaps (in the subject sequence) are treated as mismatches.

ADD REPLY
0
Entering edit mode

I don't know if this is quite right or there seems to be another caveat at least. Here is an example of my BLASTn result where I get qcovs = 100, but the match starts from 3rd position of the query i.e. the first 2 positions are not matched at all. So the qcovs, by your definition, should be < 100. Hopefully, I am not misunderstanding.

qstart  qend  qcovs   pident
57        3    31  100.0  100.000
115       3    31  100.0  100.000
293      25    54   56.0  100.000
298       3    37   95.0   97.143
302       2    44   98.0   93.023
ADD REPLY

Login before adding your answer.

Traffic: 1556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6