Question: How to get the gene name with the array as headings in numpy-python?
1
gravatar for S AR
2.1 years ago by
S AR50
Pakistan
S AR50 wrote:

I have a table like below:

 Gene name  4h  12h 24h 48h
    A2M         0.12    0.08    0.06    0.02
    FOS         0.01    0.07    0.11    0.09
    BRCA2   0.03    0.04    0.04    0.02
    CPOX            0.05    0.09    0.11    0.14

I made its array like this:

import numpy as np
genelst = np.array(["A2M", "FOS", "BRCA2","CPOX"])
a2m =np.array([[0.12,0.08,0.06,0.02]])
fos = np.array([[0.01,0.07,0.11,0.09]])
brca2 = np.array([[0.03,0.04,0.04,0.02]])
cpox = np.array([[0.05,0.09,0.11,0.14]])
comb_array = np.vstack([genelst, a2m,fos,brca2,cpox])

now i want to grab that which gene has the maximum mean expression value and sort the gene names from high to low expression values?

i did:

mean_a2m = np.mean(a2m)
mean_fos = np.mean(fos)
mean_brca2 = np.mean(brca2)
mean_cpox = np.mean(cpox)
mean_expression_gene = np.vstack([[mean_a2m,mean_fos,mean_brca2,mean_cpox]])
mean_expression_gene_array = np.vstack([[genelst], [mean_a2m,mean_fos,mean_brca2,mean_cpox]])
print ("The mean expression value for A2M is:" + str(mean_a2m))
print ("The mean expression value for FOS is:" + str(mean_fos))
print ("The mean expression value for BRCA2 is:" + str(mean_brca2))
print ("The mean expression value for CPOX is:" + str(mean_cpox))
mean_expression_gene.max()
mean_expression_gene.sort()

This is just giving me the 0.0975 max value not gene name? and how to make the array understand that the gene names are the header so that it don't count its values for which i have to separate the gene names while using .max function.

Secondly, instead of using 1d array of each ( a2m, fos, brca2, cpox) for calculating average is there a way that i can get the average value of a row or a col of a 2d array in this case for comb_array?

numpy python array • 771 views
ADD COMMENTlink modified 2.1 years ago by Bastien Hervé4.9k • written 2.1 years ago by S AR50
1

Don't deal with multiple arrays, check how to build a dataframe

ADD REPLYlink written 2.1 years ago by Bastien Hervé4.9k
3
gravatar for Bastien Hervé
2.1 years ago by
Bastien Hervé4.9k
Karolinska Institutet, Sweden
Bastien Hervé4.9k wrote:
#import pandas
import pandas as pd
#Create your data
d = {'4h': [0.12,0.01,0.03,0.05], '12h': [0.08,0.07,0.04,0.09], '24h': [0.06,0.11,0.04,0.11], '48h':[0.02,0.09,0.02,0.14]}
#Generate a dataframe with your data and the index accordingly
df = pd.DataFrame(data=d, index=['A2M', 'FOS', 'BRCA2', 'CPOX'])

#df
#12h   24h   48h    4h
#A2M    0.08  0.06  0.02  0.12
#FOS    0.07  0.11  0.09  0.01
#BRCA2  0.04  0.04  0.02  0.03
#CPOX   0.09  0.11  0.14  0.05

#Create a new 'mean' column
df['mean'] = df.mean(axis=1)
#Sort your dataframe on this new column, with decreasing mean value (ascending=False)
df = df.sort_values(["mean"], ascending=False)

#df
#12h   24h   48h    4h    mean
#CPOX   0.09  0.11  0.14  0.05  0.0975
#A2M    0.08  0.06  0.02  0.12  0.0700
#FOS    0.07  0.11  0.09  0.01  0.0700
#BRCA2  0.04  0.04  0.02  0.03  0.0325

#Read all df lines
for index, row in df.iterrows():
    print("The mean expression value for "+index+" is: "+str(row['mean']))

#The mean expression value for CPOX is: 0.0975
#The mean expression value for A2M is: 0.07
#The mean expression value for FOS is: 0.07
#The mean expression value for BRCA2 is: 0.0325
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Bastien Hervé4.9k

wow... That's great. But can i do it without panda just using numpy. As it is my assignment and i can use panda right now can you give solution within numpy?

ADD REPLYlink written 2.1 years ago by S AR50

You should have put the fact that this is an assignment in your initial post

ADD REPLYlink written 2.1 years ago by Bastien Hervé4.9k

You can also create a dictionnary of genes (as key), where each key contains a numpy array

ADD REPLYlink written 2.1 years ago by Bastien Hervé4.9k
df['mean_exp_per_time'] = df.mean(axis=0)
df['mean_exp_per_gene'] = df.mean(axis=1)
df

when im calculating col mean as well after rows mean or vice versa it is giving me :

    4h  12h 24h 48h mean_exp_per_interval   mean_exp_per_gene
A2M 0.12    0.08    0.06    0.02    NaN 0.0700
FOS 0.01    0.07    0.11    0.09    NaN 0.0700
BRCA2   0.03    0.04    0.04    0.02    NaN 0.0325
CPOX    0.05    0.09    0.11    0.14    NaN 0.0975


4h  12h 24h 48h mean_exp_per_gene   mean_exp_per_time
A2M 0.12    0.08    0.06    0.02    0.0700  NaN
FOS 0.01    0.07    0.11    0.09    0.0700  NaN
BRCA2   0.03    0.04    0.04    0.02    0.0325  NaN
CPOX    0.05    0.09    0.11    0.14    0.0975  NaN
ADD REPLYlink written 2.1 years ago by S AR50

Secondly, if i want to find which gene is showing maximum expression mean using .max() it s just showing the value not the gene name.

ADD REPLYlink written 2.1 years ago by S AR50

Your gene name is contains in your variable name, which can not be print. I don't know if you can add an index to your numpy array, maybe... But it is not the best solution

ADD REPLYlink written 2.1 years ago by Bastien Hervé4.9k

And if i use loop for col to get the mean:

for index, col in df.columns():
    print("The mean expression value for "+index+" is: "+str(col['mean_exp_per_time']))

It is giving the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-109-8ac821bb44df> in <module>()
----> 1 for index, col in df.columns():
      2     print("The mean expression value for "+index+" is: "+str(col['mean_exp_per_time']))

TypeError: 'Index' object is not callable
ADD REPLYlink written 2.1 years ago by S AR50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1053 users visited in the last hour