BLAST definition and difference between 'qcovs' and 'qcovhsp'
1
11
Entering edit mode
7.0 years ago
rwn ▴ 550

Hello,

Can anyone provide a clear definition of what exactly the BLAST+ output parameters 'qcovs' and 'qcovhsp' are? From the help pages:

"qcovs means Query Coverage Per Subject"

"qcovhsp means Query Coverage Per HSP"

But beyond this I'm less sure. My interpretation of this is that qcovs is the query coverage summed over all potential HSPs, but does this simply sum over HSPs or does it account for the possibility of overlap among HSP alignments? I guess the first case could result in coverages of > 100%, whereas the latter would be more appropriate (and more useful). My interpretation of qcovhsp is that this returns a coverage value that is specific to each individual HSP, but are these returned one per line, or are they delimited in some way?

I've Googled around and looked up the BLAST documentation but couldn't find much, apologies if I've missed something obvious.

Cheers.

blast qcovs qcovhsp query coverage • 13k views
ADD COMMENT
5
Entering edit mode
7.0 years ago
Siva ★ 1.8k

The only documentation I could find was from an old NCBI newsletter (2006/7) in which it states that the "Query Coverage" is calculated the same way as "Total Score".

See at the very end of the following page:

http://www.ncbi.nlm.nih.gov/Web/Newsltr/V15N2/BLView.html

Edit: I edited my post since it is not clear from the linked page whether overlap is taken in to account .

ADD COMMENT
1
Entering edit mode

Hi Siva,

Thanks for the response. I think that documentation is a little old for these functions - according to this blog page, the qcovs and qcovhsp options were added in 2.2.28 sometime around 2012/13. Also, I've anecdotally noticed from my own results that qcovs never seems to exceed 100%, so I guess it does account for overlap among HSPs. I'm just amazed that there is not more precise info on what these params are and/or how they are calculated! 

ADD REPLY
3
Entering edit mode

Your assumption seems to be correct (overlap information is taken in to account). Digging in to the BLAST source code, I stumbled on this part where many parameters are defined. Check line no: 110 in the following page:
http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/include/objects/seqalign/Seq_align.hpp#L54
From what I understand, 'pct_coverage' is the 'qcovs'. It is the percent of no. of bases in the query sequence aligned with the subject sequence (match or mismatch). The bases can be in one HSP or several HSPs (overlap) but they are counted only once. Gaps (in the subject sequence) are treated as mismatches.

ADD REPLY

Login before adding your answer.

Traffic: 2390 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6