blast query coverage using perl
1
0
Entering edit mode
4.2 years ago

How can I calculate query coverage using perl If I have a file like this. I want to calculate it on my own using perl.

Query id,   q. start,   q. end,         gene_id Q.length    Calculated_q_coverage
lcl|NZ_CP018664.1_gene_1    1   1149        NZ_CP018664.1_gene_1    1149
lcl|NZ_CP018664.1_gene_2    1   837     NZ_CP018664.1_gene_2    837
lcl|NZ_CP018664.1_gene_3    1   606     NZ_CP018664.1_gene_3    606
lcl|NZ_CP018664.1_gene_4    1   327     NZ_CP018664.1_gene_4    327
lcl|NZ_CP018664.1_gene_5    1   471     NZ_CP018664.1_gene_5    471
lcl|NZ_CP018664.1_gene_6    1   363     NZ_CP018664.1_gene_6    363
lcl|NZ_CP018664.1_gene_7    1   645     NZ_CP018664.1_gene_7    645

blast • 1.3k views
0
Entering edit mode

I have these 4 fields. Query id, q. start, q. end, gene_id and want to calculate query coverage using perl.

0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. Additional information should be added to the original post by editing it.

Also for reference, you should validate (up-vote or accept answers) comments/posts that help with your questions. That shows appreciation for those who help and also provides closure to open threads.

0
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

0
Entering edit mode

want to calculate query coverage using perl <=== what that mean ? you already have coverage field 5 or you want the same result as field 5 ?

0
Entering edit mode

How is coverage defined in your case? Is it q.end-q.start? If so, you can do

cat test.txt | perl -ne 'my @test = split / /, $_;print ";print$test[2]-$test[1];'  ADD REPLY 0 Entering edit mode extending a little more this; perl -lane 'if (/^Query/) {$v = "Calculated_q_coverage" } else { $v =$F[2] - $F[1] + 1; } print join "\t", @F,$v;' < test.txt > test_out.txt

0
Entering edit mode

Why did not you divide it by query length. I could not get the hang of it. Please elaborate.

0
Entering edit mode

0
Entering edit mode
4.2 years ago

Surprising answer maybe, you cannot calculate the correct query coverage from tabular output of blast.

Query coverage is not equal to: abs(qend-qstart)/qlength. While often weekly defined, I normally have seen query coverage as the proportion of sequence covered by the alignment. Why is this != to the difference of start and end? Because of the existence of HSPs and gaps. It could well be the case that there are multiple HSPs or large gaps in the alignment, such that parts of the sequence between qstart and qend are not covered by the alignment and therefore do not count for the coverage. If I am not mistaken, blast tabular report contains 1 line per hit, not per HSP.

Think of it in a similar way as with mRNA length vs gene length: the mRNA length is not equal to the difference of genomic start and stop of the transcript but the combined length of all exons in the transcript.

0
Entering edit mode

So, the only relevant thing to be calculated here is relative coverage then?

0
Entering edit mode

what do you mean by relative coverage?

0
Entering edit mode

abs(qend-qstart)/qlength *Nx;where Nx is the ratio between sequence length with and without gaps included (thus a hypothetical measure), if Nx is of course, accessible.