Question

Sum values in a row and compare the results to a value with awk or sed

0

Entering edit mode

2.6 years ago

M • 0

I have a file organized as below

>ERR1017187.315   32630:2 0:37 32630:7 0:71 |:| 0:25 32630:10 0:82                                                                        
>ERR1017187.333   32630:2 0:37 32630:3 0:75 |:| 0:117  
>ERR1017187.336   32630:1 0:37 32630:6 0:73 |:| 0:117                                                                                         
>ERR1017187.358   32630:3 0:35 32630:2 0:77 |:| 0:117

Basically, i need, for every row, to sum the X in "32630:X" and if the sum is above 20 keep the row. For instance, for the first row would be 2+7+10=27, i keep the row

Second 2+3=5, discard the row

How can i achieve that using awk or sed?

awk kraken sed • 927 views

ADD COMMENT • link 2.6 years ago by M • 0

1

Entering edit mode

This might work. First determine the sums of each line and stores them in sums.txt. The command below should read line-by-line of your file, use sed to add a line line every time 32630: appears, them awk to sum values.

cat $FILE | while read LINE; do 
   echo $LINE | sed 's/32630:/\n32630:/g' | grep ^32630 | awk -v FS='[: ]' '{sum=sum+$2}END{print sum}'
done > sums.txt

Then you can use paste and awk to print lines with sum above 20:

paste -d'\0' sums.txt $FILE  | awk -v FS=">" '{if($1>20) print ">"$2}'

If you are using MacOS, the first command might not work. Instead you can try:

cat $FILE | while read LINE; do echo $LINE | sed 's/32630:/\
32630:/g' | grep ^32630 | awk -v FS='[: ]' '{sum=sum+$2}END{print sum}'
done > sums.txt