Question: How to get the gene name with the array as headings in numpy-python?
1
2.1 years ago by
S AR50
Pakistan
S AR50 wrote:

I have a table like below:

`````` Gene name  4h  12h 24h 48h
A2M         0.12    0.08    0.06    0.02
FOS         0.01    0.07    0.11    0.09
BRCA2   0.03    0.04    0.04    0.02
CPOX            0.05    0.09    0.11    0.14
``````

I made its array like this:

``````import numpy as np
genelst = np.array(["A2M", "FOS", "BRCA2","CPOX"])
a2m =np.array([[0.12,0.08,0.06,0.02]])
fos = np.array([[0.01,0.07,0.11,0.09]])
brca2 = np.array([[0.03,0.04,0.04,0.02]])
cpox = np.array([[0.05,0.09,0.11,0.14]])
comb_array = np.vstack([genelst, a2m,fos,brca2,cpox])
``````

now i want to grab that which gene has the maximum mean expression value and sort the gene names from high to low expression values?

i did:

``````mean_a2m = np.mean(a2m)
mean_fos = np.mean(fos)
mean_brca2 = np.mean(brca2)
mean_cpox = np.mean(cpox)
mean_expression_gene = np.vstack([[mean_a2m,mean_fos,mean_brca2,mean_cpox]])
mean_expression_gene_array = np.vstack([[genelst], [mean_a2m,mean_fos,mean_brca2,mean_cpox]])
print ("The mean expression value for A2M is:" + str(mean_a2m))
print ("The mean expression value for FOS is:" + str(mean_fos))
print ("The mean expression value for BRCA2 is:" + str(mean_brca2))
print ("The mean expression value for CPOX is:" + str(mean_cpox))
mean_expression_gene.max()
mean_expression_gene.sort()
``````

This is just giving me the 0.0975 max value not gene name? and how to make the array understand that the gene names are the header so that it don't count its values for which i have to separate the gene names while using .max function.

Secondly, instead of using 1d array of each ( a2m, fos, brca2, cpox) for calculating average is there a way that i can get the average value of a row or a col of a 2d array in this case for comb_array?

numpy python array • 771 views
modified 2.1 years ago by Bastien Hervé4.9k • written 2.1 years ago by S AR50
1

Don't deal with multiple arrays, check how to build a dataframe

3
2.1 years ago by
Bastien Hervé4.9k
Karolinska Institutet, Sweden
Bastien Hervé4.9k wrote:
``````#import pandas
import pandas as pd
d = {'4h': [0.12,0.01,0.03,0.05], '12h': [0.08,0.07,0.04,0.09], '24h': [0.06,0.11,0.04,0.11], '48h':[0.02,0.09,0.02,0.14]}
#Generate a dataframe with your data and the index accordingly
df = pd.DataFrame(data=d, index=['A2M', 'FOS', 'BRCA2', 'CPOX'])

#df
#12h   24h   48h    4h
#A2M    0.08  0.06  0.02  0.12
#FOS    0.07  0.11  0.09  0.01
#BRCA2  0.04  0.04  0.02  0.03
#CPOX   0.09  0.11  0.14  0.05

#Create a new 'mean' column
df['mean'] = df.mean(axis=1)
#Sort your dataframe on this new column, with decreasing mean value (ascending=False)
df = df.sort_values(["mean"], ascending=False)

#df
#12h   24h   48h    4h    mean
#CPOX   0.09  0.11  0.14  0.05  0.0975
#A2M    0.08  0.06  0.02  0.12  0.0700
#FOS    0.07  0.11  0.09  0.01  0.0700
#BRCA2  0.04  0.04  0.02  0.03  0.0325

for index, row in df.iterrows():
print("The mean expression value for "+index+" is: "+str(row['mean']))

#The mean expression value for CPOX is: 0.0975
#The mean expression value for A2M is: 0.07
#The mean expression value for FOS is: 0.07
#The mean expression value for BRCA2 is: 0.0325
``````

wow... That's great. But can i do it without panda just using numpy. As it is my assignment and i can use panda right now can you give solution within numpy?

You should have put the fact that this is an assignment in your initial post

You can also create a dictionnary of genes (as key), where each key contains a numpy array

``````df['mean_exp_per_time'] = df.mean(axis=0)
df['mean_exp_per_gene'] = df.mean(axis=1)
df
``````

when im calculating col mean as well after rows mean or vice versa it is giving me :

``````    4h  12h 24h 48h mean_exp_per_interval   mean_exp_per_gene
A2M 0.12    0.08    0.06    0.02    NaN 0.0700
FOS 0.01    0.07    0.11    0.09    NaN 0.0700
BRCA2   0.03    0.04    0.04    0.02    NaN 0.0325
CPOX    0.05    0.09    0.11    0.14    NaN 0.0975

4h  12h 24h 48h mean_exp_per_gene   mean_exp_per_time
A2M 0.12    0.08    0.06    0.02    0.0700  NaN
FOS 0.01    0.07    0.11    0.09    0.0700  NaN
BRCA2   0.03    0.04    0.04    0.02    0.0325  NaN
CPOX    0.05    0.09    0.11    0.14    0.0975  NaN
``````

Secondly, if i want to find which gene is showing maximum expression mean using .max() it s just showing the value not the gene name.

Your gene name is contains in your variable name, which can not be print. I don't know if you can add an index to your numpy array, maybe... But it is not the best solution

And if i use loop for col to get the mean:

``````for index, col in df.columns():
print("The mean expression value for "+index+" is: "+str(col['mean_exp_per_time']))
``````

It is giving the following error:

``````TypeError                                 Traceback (most recent call last)
<ipython-input-109-8ac821bb44df> in <module>()
----> 1 for index, col in df.columns():
2     print("The mean expression value for "+index+" is: "+str(col['mean_exp_per_time']))

TypeError: 'Index' object is not callable
``````

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.