Question: How can I annotate microarray data set through Ensemble Id?
0
gravatar for modarzi
6 months ago by
modarzi140
modarzi140 wrote:

Hi,

I am studying on microarray data set which its platform is the GPL 96([HG-U133A] ). Consequently, I used the GPL 96 annotation file to annotate gene expression data and convert probe ids into gene identifiers. My problem is that in the annotation file I see three columns as gene identifiers include "Gene title", "Gene symbol" and "Gene ID". Because some times multiple probes map to one gene, then I used aggregate() function as follows:

my_aggregate_Expr_data <- aggregate(my_Expr_data[, -c(1,2)],
                          by = list(gene_name = my_data$`Gene symbol`),
                            FUN = mean,
                            na.rm = TRUE)

which in my_Expr_data[, -c(1,2)], the first column is "Prob ID" and the second column is "Gene symbol".

However, the "Gene symbol" is not a good identifier, and I need an identifier such as "Ensemble id", which indicates the unique position for each gene, and I do not have the "Ensemble id" column for each probe in the GPL96 annotation file. With this account, can I use "Gene ID" as a unique identifier for each gene associated with one or more probes? I appreciate if anybody shares his/her idea with me.

Best Regards,

affymetrix gpl 96 ensemble id • 220 views
ADD COMMENTlink modified 6 months ago by Kevin Blighe69k • written 6 months ago by modarzi140
1
gravatar for Kevin Blighe
6 months ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

Hi, I would use hgu133a.db, assuming that you can get the original probe IDs (stored in probes):

require(hgu133a.db)

probes <- rownames(gset)

annotLookup <- select(hgu133a.db, keys = probes,
  columns = c('PROBEID', 'ENSEMBL', 'SYMBOL'))

You can modify the above code to work for Gene Symbol - to - Ensembl conversion, too.

Kevin

ADD COMMENTlink modified 6 months ago • written 6 months ago by Kevin Blighe69k

Dear Dr. Blighe

Thanks for your comment. I also, add 'ENTREZID' in your code. but as you see for 1007_s_at I have 8 'Ensemble Id'. Their Gene Symbol is DDR1 and MIR4640. So, exactly my question is which 'Ensemble Id' should I use for 1007_s_at at DDR1? or by which mechanism I can understand which 'Ensemble Id' is my target for downstream analysis in 1007_s_at probe ?

I appreciate it if you share your comment with me.

Best Regards,

     PROBEID        ENSEMBL    SYMBOL  ENTREZID
1   1007_s_at   ENSG00000204580 DDR1    780
2   1007_s_at   ENSG00000223680 DDR1    780
3   1007_s_at   ENSG00000229767 DDR1    780
4   1007_s_at   ENSG00000230456 DDR1    780
5   1007_s_at   ENSG00000234078 DDR1    780
6   1007_s_at   ENSG00000137332 DDR1    780
7   1007_s_at   ENSG00000215522 DDR1    780
8   1007_s_at   ENSG00000284370 MIR4640 100616237
9   1053_at     ENSG00000049541 RFC2    5982
10  117_at      ENSG00000173110 HSPA6   3310
11  121_at      ENSG00000125618 PAX8    7849
12  1255_g_at   ENSG00000048545 GUCA1A  2978
13  1255_g_at   ENSG00000287363 GUCA1A  2978
14  1294_at     ENSG00000182179 UBA7    7318
ADD REPLYlink modified 6 months ago • written 6 months ago by modarzi140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2769 users visited in the last hour
_