How to count elements with a specific condition in csv file using python
2
0
Entering edit mode
3.0 years ago
mumdooh • 0

I am still learning python language. I have a table in csv format with n columns where the header is Tax_id and every column contains species names like this

  9606                  9606.1               508771  
    0                root                root                  root    
    1  cellular organisms  cellular organisms    cellular organisms
    2           Eukaryota           Eukaryota             Eukaryota
    3        Opisthokonta        Opisthokonta                   Sar
    4             Metazoa             Metazoa             Alveolata
    5           Eumetazoa           Eumetazoa           Apicomplexa
    6           Bilateria           Bilateria           Conoidasida
    7       Deuterostomia       Deuterostomia              Coccidia
    8            Chordata            Chordata        Eucoccidiorida
    9            Craniata            Craniata           Eimeriorina

I am struggling to write a python code that counts each species with their occurrence only for columns that contains species named "Metazoa".

 #to return something like

        Eumetazoa 2
        Bilateria 2
        Craniata  2
code python csv • 5.4k views
ADD COMMENT
1
Entering edit mode
3.0 years ago
Ram 43k

You may want to use a dictionary and loop through the column in question, or use pandas and some sort of group-by function. For the pandas approach, see this post on StackOverflow.

ADD COMMENT
1
Entering edit mode
3.0 years ago

Here's one way, perhaps:

#!/usr/bin/env python

'''
so9467093.py
'''

import io
import pandas as pd

input_str = '''9606,9606.1,508771
root,root,root
"cellular organisms","cellular organisms","cellular organisms"
Eukaryota,Eukaryota,Eukaryota
Opisthokonta,Opisthokonta,Sar
Metazoa,Metazoa,Alveolata
Eumetazoa,Eumetazoa,Apicomplexa
Bilateria,Bilateria,Conoidasida
Deuterostomia,Deuterostomia,Coccidia
Chordata,Chordata,Eucoccidiorida
Craniata,Craniata,Eimeriorina'''

input = io.StringIO(input_str)
df = pd.read_csv(input, sep=",")

filtered_column_names = (df == 'Metazoa').any(axis=0)
subset = df[df.columns[filtered_column_names]]

print(subset.stack().value_counts())

Output:

% ./so9467093.py
root                  2
Bilateria             2
Eukaryota             2
Deuterostomia         2
Chordata              2
Eumetazoa             2
Metazoa               2
cellular organisms    2
Craniata              2
Opisthokonta          2
dtype: int64
ADD COMMENT

Login before adding your answer.

Traffic: 3205 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6