Sum values in a row and compare the results to a value with awk or sed
0
0
Entering edit mode
2.6 years ago
M • 0

I have a file organized as below

>ERR1017187.315   32630:2 0:37 32630:7 0:71 |:| 0:25 32630:10 0:82                                                                        
>ERR1017187.333   32630:2 0:37 32630:3 0:75 |:| 0:117  
>ERR1017187.336   32630:1 0:37 32630:6 0:73 |:| 0:117                                                                                         
>ERR1017187.358   32630:3 0:35 32630:2 0:77 |:| 0:117

Basically, i need, for every row, to sum the X in "32630:X" and if the sum is above 20 keep the row. For instance, for the first row would be 2+7+10=27, i keep the row

Second 2+3=5, discard the row

How can i achieve that using awk or sed?

awk kraken sed • 923 views
ADD COMMENT
1
Entering edit mode

This might work. First determine the sums of each line and stores them in sums.txt. The command below should read line-by-line of your file, use sed to add a line line every time 32630: appears, them awk to sum values.

cat $FILE | while read LINE; do 
   echo $LINE | sed 's/32630:/\n32630:/g' | grep ^32630 | awk -v FS='[: ]' '{sum=sum+$2}END{print sum}'
done > sums.txt

Then you can use paste and awk to print lines with sum above 20:

paste -d'\0' sums.txt $FILE  | awk -v FS=">" '{if($1>20) print ">"$2}'

If you are using MacOS, the first command might not work. Instead you can try:

cat $FILE | while read LINE; do echo $LINE | sed 's/32630:/\
32630:/g' | grep ^32630 | awk -v FS='[: ]' '{sum=sum+$2}END{print sum}'
done > sums.txt
ADD REPLY
0
Entering edit mode

It worked, many thanks

ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6