Question: BLAST: How much of the query is aligned?
0
gravatar for bongbang
4.8 years ago by
bongbang70
United States
bongbang70 wrote:

At first, I thought this question would answered by the "qcovs" field, but a glance at the results proved that that isn't the case. To begin with, each qcovs value  relates not to the original query, but a smaller query partitioned therefrom. And I don't even know what this number actually means for those mini-queries.  "Query Coverage Per Subject" is what the manual says, but apparently they use it in a different sense from what I would normally understand.

Second, "length" is supposed to be "length of alignment," but I'm now sure what that means, either. It's neither the length of the mini-query (qend-qstart+1) nor that of the corresponding subject, although there's a strong correlation between the three.

My purpose is to see whether the genome assembler succeeded in putting together a conserved gene of interest. As a measure of how well of each original (unpartioned) gene query is assembled, I'm think of either:

max([set of "nident" from all mini-queries based on the same original query])/original query length

or

max([set of "length" from all mini-queries based on the same original query])/original query length

Which one, if any, is the right approach? Please feel free to suggest your own, although I would appreciate  an explanation of what I got wrong. An elucidation of "qcov" and "length" would be nice, too. Thank you.
 



 

blast • 2.9k views
ADD COMMENTlink modified 4.7 years ago by Siva1.6k • written 4.8 years ago by bongbang70

Please check my recent comment in another thread.

C: BLAST definition and difference between 'qcovs' and 'qcovhsp'

ADD REPLYlink written 4.7 years ago by Siva1.6k
1
gravatar for Istvan Albert
4.8 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

It is indeed surprisingly difficult to find definitions even for seemingly simple concepts. I wanted to double check what I am going to say but couldn't find any source of information. Here it goes anyway, I believe that alignment length refers to the number of matched or mismatched bases of the query. 

In general it is difficult to characterize with a single measure how similar sequences are. Works OK when these are very similar and kind of break down as the sequences become more dissimilar.

 

ADD COMMENTlink written 4.8 years ago by Istvan Albert ♦♦ 81k
0
gravatar for Siva
4.7 years ago by
Siva1.6k
United States
Siva1.6k wrote:

Are you sure qcovs is not giving the query coverage per subject? Because there is also another option qcovhsp which gives Query Coverage Per HSP

If you do want to calculate the query coverage yourself take in to account that there could be overlaps.

ADD COMMENTlink written 4.7 years ago by Siva1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1423 users visited in the last hour