How to calculate percentages of values in specific rows of a file?
1
0
Entering edit mode
21 months ago
FelipeMSD • 0

I built a file where I have headers with generic IDs that identify multiple alignments and on the next rows the genome IDs that correspond to that alignment ID. Along with the genome IDs I have numbers that I would like to convert in percentages, adding the value of the percentage as a third column, according to the example below:

ORIGINAL FILE:

>Alignment_1
GCA_910584205.1  13
GCA_003584705.1  7
>Alignment_2
GCA_002361735.1  168
GCA_002492725.1  2880
GCA_002492725.1  2880
>Alignment_3
GCA_900540295.1  165
GCA_002490525.1  125

FINAL FILE:

>Alignment_1
GCA_910584205.1  13 65%
GCA_003584705.1  7 35%
>Alignment_2
GCA_002361735.1  168 3%
GCA_002492725.1  2880 49%
GCA_002492725.1  2880 49%
>Alignment_3
GCA_900540295.1  165 57%
GCA_002490525.1  125 43%

I know awk can be used to calculate the percentages of values in a column in comparison with the total value of the column but how can I get these percentages by groups of alignments delimited by the headers?

percentages awk alignment • 560 views
ADD COMMENT
1
Entering edit mode
21 months ago
FelipeMSD • 0

Just used that code suggested by Ed Morton on this link: https://unix.stackexchange.com/questions/710154/how-to-calculate-percentages-of-values-in-specific-rows-of-a-file/710243#710243

$ cat tst.awk
/>/ {
    if ( NR>1 ) {
        prt()
    }
    key = $0
    cnt = tot = 0
    next
}
{
    ids[++cnt] = $1
    vals[cnt]  = $2
    tot += $2
}
END { prt() }

function prt(           i) {
    print key
    for ( i=1; i<=cnt; i++ ) {
        print ids[i], vals[i], ceil( (tot ? vals[i] / tot : 0) * 100 )"%"
    }
}

function ceil(x,        y) {
    y = int(x)
    return ( x>y ? y+1 : y )
}

Then, just use the awk code:

$ awk -f tst.awk file
ADD COMMENT

Login before adding your answer.

Traffic: 1463 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6