Convert coverage values to number of occurrence
1
0
Entering edit mode
3.4 years ago

Hi everyone,

I have a summary of reports like this

#FILE   NUM_FOUND   KPHS_23120  abaR    astA
file1.tab   1   .   100.00;99.86;99.86;100.00   .
file2.tab   2   .   .   .
file3.tab   18  .   .   98.29;98.29;98.29

The values represent the coverage, and for example for file1.tab, the abaR gene is present in 4 copies. However, I'm interested not in the coverage, but in converting to number of occurrence of each gene, so that it will show like this

#FILE   NUM_FOUND   KPHS_23120  abaR    astA
file1.tab   1   0   4   0
file2.tab   2   0   0   0
file3.tab   18  0   0   3

It's easy to change the . by 0, but I'm not sure how to replace the different coverage values by the number of occurrence.

Thanks a lot!

sequence gene • 469 views
ADD COMMENT
1
Entering edit mode
3.4 years ago
awk '/^#/ {print;next} {printf("%s\t%s",$1,$2);for(i=3;i<=NF;i++) {printf("\t%d",$i=="."?0:split($i,a,/[;]/));} printf("\n");}'  input.txt
ADD COMMENT

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6