blast query coverage using perl
1
0
Entering edit mode
6.7 years ago

How can I calculate query coverage using perl If I have a file like this. I want to calculate it on my own using perl.

Query id,   q. start,   q. end,         gene_id Q.length    Calculated_q_coverage
lcl|NZ_CP018664.1_gene_1    1   1149        NZ_CP018664.1_gene_1    1149    
lcl|NZ_CP018664.1_gene_2    1   837     NZ_CP018664.1_gene_2    837 
lcl|NZ_CP018664.1_gene_3    1   606     NZ_CP018664.1_gene_3    606 
lcl|NZ_CP018664.1_gene_4    1   327     NZ_CP018664.1_gene_4    327 
lcl|NZ_CP018664.1_gene_5    1   471     NZ_CP018664.1_gene_5    471 
lcl|NZ_CP018664.1_gene_6    1   363     NZ_CP018664.1_gene_6    363 
lcl|NZ_CP018664.1_gene_7    1   645     NZ_CP018664.1_gene_7    645
blast • 2.3k views
ADD COMMENT
0
Entering edit mode

I have these 4 fields. Query id, q. start, q. end, gene_id and want to calculate query coverage using perl.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. Additional information should be added to the original post by editing it.

Also for reference, you should validate (up-vote or accept answers) comments/posts that help with your questions. That shows appreciation for those who help and also provides closure to open threads.

ADD REPLY
0
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

want to calculate query coverage using perl <=== what that mean ? you already have coverage field 5 or you want the same result as field 5 ?

ADD REPLY
0
Entering edit mode

How is coverage defined in your case? Is it q.end-q.start? If so, you can do

cat test.txt | perl -ne 'my @test = split / /, $_;print ";print $test[2]-$test[1];'
ADD REPLY
0
Entering edit mode

extending a little more this;

perl -lane 'if (/^Query/) { $v = "Calculated_q_coverage" } else { $v = $F[2] - $F[1] + 1; } print join "\t", @F, $v;' < test.txt > test_out.txt
ADD REPLY
0
Entering edit mode

Why did not you divide it by query length. I could not get the hang of it. Please elaborate.

ADD REPLY
0
Entering edit mode

Because in your example your coverage field = query length ....

ADD REPLY
0
Entering edit mode
6.7 years ago
Michael 54k

Surprising answer maybe, you cannot calculate the correct query coverage from tabular output of blast.

Query coverage is not equal to: abs(qend-qstart)/qlength. While often weekly defined, I normally have seen query coverage as the proportion of sequence covered by the alignment. Why is this != to the difference of start and end? Because of the existence of HSPs and gaps. It could well be the case that there are multiple HSPs or large gaps in the alignment, such that parts of the sequence between qstart and qend are not covered by the alignment and therefore do not count for the coverage. If I am not mistaken, blast tabular report contains 1 line per hit, not per HSP.

Think of it in a similar way as with mRNA length vs gene length: the mRNA length is not equal to the difference of genomic start and stop of the transcript but the combined length of all exons in the transcript.

ADD COMMENT
0
Entering edit mode

So, the only relevant thing to be calculated here is relative coverage then?

ADD REPLY
0
Entering edit mode

what do you mean by relative coverage?

ADD REPLY
0
Entering edit mode

abs(qend-qstart)/qlength *Nx;where Nx is the ratio between sequence length with and without gaps included (thus a hypothetical measure), if Nx is of course, accessible.

ADD REPLY

Login before adding your answer.

Traffic: 2565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6