I am studying on microarray data set which its platform is the GPL 96([HG-U133A] ). Consequently, I used the GPL 96 annotation file to annotate gene expression data and convert probe ids into gene identifiers. My problem is that in the annotation file I see three columns as gene identifiers include "Gene title", "Gene symbol" and "Gene ID". Because some times multiple probes map to one gene, then I used
aggregate() function as follows:
my_aggregate_Expr_data <- aggregate(my_Expr_data[, -c(1,2)], by = list(gene_name = my_data$`Gene symbol`), FUN = mean, na.rm = TRUE)
my_Expr_data[, -c(1,2)], the first column is "Prob ID" and the second column is "Gene symbol".
However, the "Gene symbol" is not a good identifier, and I need an identifier such as "Ensemble id", which indicates the unique position for each gene, and I do not have the "Ensemble id" column for each probe in the GPL96 annotation file. With this account, can I use "Gene ID" as a unique identifier for each gene associated with one or more probes? I appreciate if anybody shares his/her idea with me.