Question: Sum up values for a specific column of multiple files
0
gravatar for dazhudou1122
5 weeks ago by
dazhudou112260
dazhudou112260 wrote:

Dear Biostar community,

I have ~200 files of mapping results from BBMap. Lets say I have file A.txt, and it looks like this:

#name   %unambiguousReads   unambiguousMB   %ambiguousReads ambiguousMB unambiguousReads    ambiguousReads  assignedReads   assignedBases
CP048304    0.00133 0.021248    0   0   147 0   158 22855
CP048305    0.00122 0.019355    0   0   135 0   146 20964
CP048306    0.00063 0.009802    0   0   69  0   81  11554
CP048307    0.0006  0.009519    0   0   66  0   78  11271
CP048308    0.00056 0.008937    0   0   62  0   76  10980
CP048309    0.00046 0.007286    0   0   51  0   57  8157
CP048310    0.00031 0.004859    0   0   34  0   38  5441
CP048311    0.00026 0.004082    0   0   29  0   38  5393
CP048312    0.00022 0.003489    0   0   24  0   34  4945
CP048313    0.00016 0.002588    0   0   18  0   22  3170
CP048314    0.00016 0.002498    0   0   18  0   30  4250

I want to sum up the column of "ambiguous reads" (the sixth column) and out put to a file, which the first column is the file name and second column is the sum value, like this:

A 653
B 550
C 375
...

I tried many methods after googling but none worked so far. Can you please help?

Thank you!

Best,

Wenhan

sequencing • 90 views
ADD COMMENTlink modified 5 weeks ago by lakhujanivijay5.0k • written 5 weeks ago by dazhudou112260

There are plenty of quick and dirty ways to do it. What have you tried ??

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by geek_y11k
1
gravatar for lakhujanivijay
5 weeks ago by
lakhujanivijay5.0k
India
lakhujanivijay5.0k wrote:
awk -F '\t' ' {sum += $4} END {print sum}' A.txt

Explained here

ADD COMMENTlink written 5 weeks ago by lakhujanivijay5.0k

Dear lakhujanivijay,

Thank you! It worked great! I tried a lot of other solution but none of them work. Also thank you for the explanation. I will list the complete code below:

ls *.txt > samples.txt
while read -r file start end; do sum=$(awk -F '\t' ' {sum += $6} END {print sum}' "$file"); 
echo "$file $sum"; done < samples.txt > Summary.txt
ADD REPLYlink written 5 weeks ago by dazhudou112260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 792 users visited in the last hour