Entering edit mode
4.4 years ago
modarzi
▴
170
Hi,It may my question were asked by another person but I have below Microarray gene expression as 'myExprdat' dataframe:
ID Gene symbol Sample1 Sample2 Sample3 Sample4
1 1007_s_at MIR4640///DDR1 108.38 321.8 66.72 19.43
2 1053_at RFC2 121.1 148.06 306.55 242.19
3 117_at HSPA6 107.63 59.71 163.14 24.42
4 121_at CYP2E1 8.51 4.72 4.79 10.78
5 1255_g_at GUCA1A 4.23 4.26 4.26 4.26
6 1294_at MIR5193///UBA7 131.6 82.71 191.34 70.52
7 1316_at THRA 9 8.17 8.06 7.94
8 1320_at PTPN21 6.45 6.63 6.77 6.87
9 1405_i_at HSPA6 1379.57 215.27 191.34 108.38
10 1431_at CYP2E1 5.94 6.11 6.11 6.06
So, for combining some rows based on 'Gene symbol' and theire mean value, I used below code:
myExprdat_aggregate <- aggregate(myExprdat[, -c(1,2)],
by = list(Gene = myExprdat$`Gene symbol`),
FUN = mean,
na.rm = TRUE)
and I get a dataframe that has 8 rows(Gene symbol) and 5 Columns(Gene and Sample 1 to sample4) but I don't know why all cells of 'myExprdat_aggregate' are NA?
I appreciate if anybody shares his/her comments with me.
I run str(myExprdat) and got below results:
I cant understand your mean about your last comment:
Hey, well, there is your problem: your numerical values are encoded as factors. You will have to go back a few steps to find out which step is resulting in these numerical values being regarded as factors.
For the second comment, I mean that you just have to do:
Dear Dr. Blighe, I went back and found the reason of be factor. Now I run
str(myExprdat)
and I got below result:and run
aggregate()
but again I got a datafram by full of NA.really, I don't know why I got that result?
Similar issue here... now, however, your numerical values are encoded as characters. You will have to encode them as numeric values. Here, I will reproduce your problem and then solve it:
Now, convert the relevant columns to numerical values:
Thanks for your comment. I have run that code and now I have a matrix which aggregates probs with similar gene symbols. but the in "Gene" column of the result' matrix for the first row It doesn't have Gene name. In other words,based on below data frame in the first row for each sample it has values but the Gene name is not clear:
I appreciate if you share your comment with me.
I am not to know the source of that particular problem. Please review all steps of your code, reviewing both input and output, in order to understand why there may be no gene name there.
Thanks. Dear Dr. Blighe. I have one more question:
As you see in the 'Gene symbol' of my microarray dataset some genes have 2 names.e.g, 'MIR4640///DDR1' or 'MIR5193///UBA7'.
what should I do by these gene symbols? can I remove the second part of the name(name after '///')? I mean can I have 'MIR4640' or 'MIR5193'?
I appreciate if you share your comment with me.