Convert coverage values to number of occurrence
1
0
Entering edit mode
17 months ago

Hi everyone,

I have a summary of reports like this

#FILE   NUM_FOUND   KPHS_23120  abaR    astA
file1.tab   1   .   100.00;99.86;99.86;100.00   .
file2.tab   2   .   .   .
file3.tab   18  .   .   98.29;98.29;98.29

The values represent the coverage, and for example for file1.tab, the abaR gene is present in 4 copies. However, I'm interested not in the coverage, but in converting to number of occurrence of each gene, so that it will show like this

#FILE   NUM_FOUND   KPHS_23120  abaR    astA
file1.tab   1   0   4   0
file2.tab   2   0   0   0
file3.tab   18  0   0   3

It's easy to change the . by 0, but I'm not sure how to replace the different coverage values by the number of occurrence.

Thanks a lot!

sequence gene • 260 views
1
Entering edit mode
17 months ago
awk '/^#/ {print;next} {printf("%s\t%s",$1,$2);for(i=3;i<=NF;i++) {printf("\t%d",$i=="."?0:split($i,a,/[;]/));} printf("\n");}'  input.txt