Question

Mapping probe ids in microarray data to gene ids

0

Entering edit mode

4.5 years ago

Natasha ▴ 40

I've performed RMA normalization of intensity data in raw files of dataset GSE1133. The output obtained after normalization is in the following format

                            GSM18584.CEL GSM18585.CEL GSM18586.CEL GSM18587.CEL
AFFX-18SRNAMur/X00686_3_at     10.324639    10.309749     7.978267     7.784038
AFFX-18SRNAMur/X00686_5_at      9.080051     9.401111     5.540294     5.539700

The data is from platform https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL1073

I would like to map the probe ids to gene ids. I had a look at the table presented in the above link.

The table header presents the following ids

Data table header descriptions
ID  Probe Set Name
Identifier_Source   Identifier_Source
Description Description
CLONE_ID    clone identifier
Sequence_Type   Sequence Type
SEQUENCE    
SPOT_ID Column added by GEO staff to facilitate sequence tracking in Entrez GEO
GB_ACC  GenBank Accession Number

I also downloaded the complete file , I could find gene names but I am not able to find mappings like Entrez gene ids. I also read that http://genome.ucsc.edu browser can be used. But I am not sure which tool has to be used from the genome browser.

Could someone suggest how to proceed?

gene microarray • 2.9k views

ADD COMMENT • link updated 4.5 years ago by Manoj ▴ 190 • written 4.5 years ago by Natasha ▴ 40

ATpoint · Answer 1 · 2019-10-19

0

Entering edit mode

4.5 years ago

Manoj ▴ 190

You can easily do this in R.. This is an example of human.

library(hgu133plus2.db)

library(annotate)

library(limma)

probeset.list <- read.table("data.txt")

gene.symbols <- getSYMBOL(rownames(probeset.list),"hgu133plus2.db")

results <- cbind(probeset.list, gene.symbols)

print(head(results))

Hope this help

ADD COMMENT • link updated 4.5 years ago by ATpoint 81k • written 4.5 years ago by Manoj ▴ 190

2

Entering edit mode

This will not work, in this case, because the samples in which the user is interested are not from the U133 chip - they are from what seems to be a customised chip called 'GNF1M' (GPL1073).

Natasha, the easiest way is probably to download the 'Annotation SOFT table...' from HERE, read that into R, and then match up this annotation data with your expression matrix. Gene symbols are in column 3 of this annotation file.