Question: How can I annotate microarray data set through Ensemble Id?
gravatar for modarzi
6 months ago by
modarzi140 wrote:


I am studying on microarray data set which its platform is the GPL 96([HG-U133A] ). Consequently, I used the GPL 96 annotation file to annotate gene expression data and convert probe ids into gene identifiers. My problem is that in the annotation file I see three columns as gene identifiers include "Gene title", "Gene symbol" and "Gene ID". Because some times multiple probes map to one gene, then I used aggregate() function as follows:

my_aggregate_Expr_data <- aggregate(my_Expr_data[, -c(1,2)],
                          by = list(gene_name = my_data$`Gene symbol`),
                            FUN = mean,
                            na.rm = TRUE)

which in my_Expr_data[, -c(1,2)], the first column is "Prob ID" and the second column is "Gene symbol".

However, the "Gene symbol" is not a good identifier, and I need an identifier such as "Ensemble id", which indicates the unique position for each gene, and I do not have the "Ensemble id" column for each probe in the GPL96 annotation file. With this account, can I use "Gene ID" as a unique identifier for each gene associated with one or more probes? I appreciate if anybody shares his/her idea with me.

Best Regards,

affymetrix gpl 96 ensemble id • 220 views
ADD COMMENTlink modified 6 months ago by Kevin Blighe69k • written 6 months ago by modarzi140
gravatar for Kevin Blighe
6 months ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

Hi, I would use hgu133a.db, assuming that you can get the original probe IDs (stored in probes):


probes <- rownames(gset)

annotLookup <- select(hgu133a.db, keys = probes,
  columns = c('PROBEID', 'ENSEMBL', 'SYMBOL'))

You can modify the above code to work for Gene Symbol - to - Ensembl conversion, too.


ADD COMMENTlink modified 6 months ago • written 6 months ago by Kevin Blighe69k

Dear Dr. Blighe

Thanks for your comment. I also, add 'ENTREZID' in your code. but as you see for 1007_s_at I have 8 'Ensemble Id'. Their Gene Symbol is DDR1 and MIR4640. So, exactly my question is which 'Ensemble Id' should I use for 1007_s_at at DDR1? or by which mechanism I can understand which 'Ensemble Id' is my target for downstream analysis in 1007_s_at probe ?

I appreciate it if you share your comment with me.

Best Regards,

1   1007_s_at   ENSG00000204580 DDR1    780
2   1007_s_at   ENSG00000223680 DDR1    780
3   1007_s_at   ENSG00000229767 DDR1    780
4   1007_s_at   ENSG00000230456 DDR1    780
5   1007_s_at   ENSG00000234078 DDR1    780
6   1007_s_at   ENSG00000137332 DDR1    780
7   1007_s_at   ENSG00000215522 DDR1    780
8   1007_s_at   ENSG00000284370 MIR4640 100616237
9   1053_at     ENSG00000049541 RFC2    5982
10  117_at      ENSG00000173110 HSPA6   3310
11  121_at      ENSG00000125618 PAX8    7849
12  1255_g_at   ENSG00000048545 GUCA1A  2978
13  1255_g_at   ENSG00000287363 GUCA1A  2978
14  1294_at     ENSG00000182179 UBA7    7318
ADD REPLYlink modified 6 months ago • written 6 months ago by modarzi140
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2769 users visited in the last hour