Sum values in a row and compare the results to a value with awk or sed
Entering edit mode
2.4 years ago
M • 0

I have a file organized as below

>ERR1017187.315   32630:2 0:37 32630:7 0:71 |:| 0:25 32630:10 0:82                                                                        
>ERR1017187.333   32630:2 0:37 32630:3 0:75 |:| 0:117  
>ERR1017187.336   32630:1 0:37 32630:6 0:73 |:| 0:117                                                                                         
>ERR1017187.358   32630:3 0:35 32630:2 0:77 |:| 0:117

Basically, i need, for every row, to sum the X in "32630:X" and if the sum is above 20 keep the row. For instance, for the first row would be 2+7+10=27, i keep the row

Second 2+3=5, discard the row

How can i achieve that using awk or sed?

awk kraken sed • 864 views
Entering edit mode

This might work. First determine the sums of each line and stores them in sums.txt. The command below should read line-by-line of your file, use sed to add a line line every time 32630: appears, them awk to sum values.

cat $FILE | while read LINE; do 
   echo $LINE | sed 's/32630:/\n32630:/g' | grep ^32630 | awk -v FS='[: ]' '{sum=sum+$2}END{print sum}'
done > sums.txt

Then you can use paste and awk to print lines with sum above 20:

paste -d'\0' sums.txt $FILE  | awk -v FS=">" '{if($1>20) print ">"$2}'

If you are using MacOS, the first command might not work. Instead you can try:

cat $FILE | while read LINE; do echo $LINE | sed 's/32630:/\
32630:/g' | grep ^32630 | awk -v FS='[: ]' '{sum=sum+$2}END{print sum}'
done > sums.txt
Entering edit mode

It worked, many thanks


Login before adding your answer.

Traffic: 1055 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6