Question: Convert coverage values to number of occurrence
0
gravatar for genomes_and_MGEs
7 weeks ago by
genomes_and_MGEs10 wrote:

Hi everyone,

I have a summary of reports like this

#FILE   NUM_FOUND   KPHS_23120  abaR    astA
file1.tab   1   .   100.00;99.86;99.86;100.00   .
file2.tab   2   .   .   .
file3.tab   18  .   .   98.29;98.29;98.29

The values represent the coverage, and for example for file1.tab, the abaR gene is present in 4 copies. However, I'm interested not in the coverage, but in converting to number of occurrence of each gene, so that it will show like this

#FILE   NUM_FOUND   KPHS_23120  abaR    astA
file1.tab   1   0   4   0
file2.tab   2   0   0   0
file3.tab   18  0   0   3

It's easy to change the . by 0, but I'm not sure how to replace the different coverage values by the number of occurrence.

Thanks a lot!

sequence gene • 111 views
ADD COMMENTlink modified 7 weeks ago by Pierre Lindenbaum133k • written 7 weeks ago by genomes_and_MGEs10
1
gravatar for Pierre Lindenbaum
7 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:
awk '/^#/ {print;next} {printf("%s\t%s",$1,$2);for(i=3;i<=NF;i++) {printf("\t%d",$i=="."?0:split($i,a,/[;]/));} printf("\n");}'  input.txt
ADD COMMENTlink written 7 weeks ago by Pierre Lindenbaum133k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 939 users visited in the last hour