Sum the values based on the variant names and give the result in stats format
0
0
Entering edit mode
19 months ago

I have a text file in which the contents are the following:

 3 synonymous_variant
      1 missense_variant
      1 EFFECT
      1 downstream_gene_variant
      6 missense_variant
      2 upstream_gene_variant
      2 synonymous_variant
      1 EFFECT
      1 downstream_gene_variant
      4 missense_variant
      3 synonymous_variant
      1 upstream_gene_variant
      1 EFFECT
      1 downstream_gene_variant
      3 synonymous_variant
      3 missense_variant
      1 EFFECT
      4 synonymous_variant
      3 missense_variant
      1 EFFECT
      1 downstream_gene_variant
      6 missense_variant
      1 synonymous_variant
      1 EFFECT
      1 downstream_gene_variant
      3 missense_variant
      1 EFFECT
      1 downstream_gene_variant
      4 synonymous_variant
      4 missense_variant
      1 EFFECT
      2 missense_variant
      1 upstream_gene_variant

from this, I need the following result:

missense_variant  its total
downstream variant  its total
upstream variant  its total
....etc

I tried it but did find correct result. Can anyone please tell me how to do it in python or shell or any other language? Thanks in advance!

coding • 1.3k views
ADD COMMENT
0
Entering edit mode

What have you tried? This should be straightforward in awk. With R, this should be even simpler.

ADD REPLY
0
Entering edit mode

I tried with python but it was giving me total of all variants. Can you please tell me how to do it using awk?

ADD REPLY
0
Entering edit mode

What did you try with python? Did you make a dict from column two and then sum column 1 for each unique column 2 key?

ADD REPLY
0
Entering edit mode

I did this:

data = {}

with open('sorted_effect_distribution.txt', 'r') as f:
    for line in f:
        name, value = line.strip().split()
        if name in data:
            data[name] += int(value[0])
        else:
            data[name] = int(value[0])

for name, value in data.items():
    print(f"{name}: {value}")
ADD REPLY
0
Entering edit mode

Please give the command in awk. It would be really helpful.

ADD REPLY
0
Entering edit mode

No. It's a good exercise for you. Search online on how to use awk dictionaries.

ADD REPLY
0
Entering edit mode

Please let others comment on this. Thanks for your time.

ADD REPLY
0
Entering edit mode

I'm not stopping anyone from commenting - most people are ignoring the post, I'm simply taking the time to tell you that you're better off following a certain path.

ADD REPLY
0
Entering edit mode

You have the columns inverted - shouldn't you be doing value, name = line.strip().split()?

ADD REPLY

Login before adding your answer.

Traffic: 1152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6