Question

How can I annotate microarray data set through Ensemble Id?

0

Entering edit mode

3.8 years ago

modarzi ▴ 160

Hi,

I am studying on microarray data set which its platform is the GPL 96([HG-U133A] ). Consequently, I used the GPL 96 annotation file to annotate gene expression data and convert probe ids into gene identifiers. My problem is that in the annotation file I see three columns as gene identifiers include "Gene title", "Gene symbol" and "Gene ID". Because some times multiple probes map to one gene, then I used aggregate() function as follows:

my_aggregate_Expr_data <- aggregate(my_Expr_data[, -c(1,2)],
                          by = list(gene_name = my_data$`Gene symbol`),
                            FUN = mean,
                            na.rm = TRUE)

which in my_Expr_data[, -c(1,2)], the first column is "Prob ID" and the second column is "Gene symbol".

However, the "Gene symbol" is not a good identifier, and I need an identifier such as "Ensemble id", which indicates the unique position for each gene, and I do not have the "Ensemble id" column for each probe in the GPL96 annotation file. With this account, can I use "Gene ID" as a unique identifier for each gene associated with one or more probes? I appreciate if anybody shares his/her idea with me.

Best Regards,

Affymetrix Ensemble Id GPL 96 • 1.5k views

ADD COMMENT • link updated 2.4 years ago by seta ★ 1.9k • written 3.8 years ago by modarzi ▴ 160

score 1 · Answer 1 · 2020-07-10

1

Entering edit mode

3.8 years ago

Kevin Blighe 87k

Hi, I would use hgu133a.db, assuming that you can get the original probe IDs (stored in probes):

require(hgu133a.db)

probes <- rownames(gset)

annotLookup <- select(hgu133a.db, keys = probes,
  columns = c('PROBEID', 'ENSEMBL', 'SYMBOL'))

You can modify the above code to work for Gene Symbol - to - Ensembl conversion, too.

Kevin

ADD COMMENT • link 3.8 years ago by Kevin Blighe 87k

0

Entering edit mode

Dear Dr. Blighe

Thanks for your comment. I also, add 'ENTREZID' in your code. but as you see for 1007_s_at I have 8 'Ensemble Id'. Their Gene Symbol is DDR1 and MIR4640. So, exactly my question is which 'Ensemble Id' should I use for 1007_s_at at DDR1? or by which mechanism I can understand which 'Ensemble Id' is my target for downstream analysis in 1007_s_at probe ?

I appreciate it if you share your comment with me.

Best Regards,

     PROBEID        ENSEMBL    SYMBOL  ENTREZID
1   1007_s_at   ENSG00000204580 DDR1    780
2   1007_s_at   ENSG00000223680 DDR1    780
3   1007_s_at   ENSG00000229767 DDR1    780
4   1007_s_at   ENSG00000230456 DDR1    780
5   1007_s_at   ENSG00000234078 DDR1    780
6   1007_s_at   ENSG00000137332 DDR1    780
7   1007_s_at   ENSG00000215522 DDR1    780
8   1007_s_at   ENSG00000284370 MIR4640 100616237
9   1053_at     ENSG00000049541 RFC2    5982
10  117_at      ENSG00000173110 HSPA6   3310
11  121_at      ENSG00000125618 PAX8    7849
12  1255_g_at   ENSG00000048545 GUCA1A  2978
13  1255_g_at   ENSG00000287363 GUCA1A  2978
14  1294_at     ENSG00000182179 UBA7    7318

ADD REPLY • link 3.8 years ago by modarzi ▴ 160

0

Entering edit mode

That's right, Kevin. I have the same problem, could you please let me know how I can solve this issue?

ADD REPLY • link 2.4 years ago by seta ★ 1.9k