Reading microarray data from the Gene Expression Omnibus
1
0
Entering edit mode
2.6 years ago
Caitlin ▴ 100

Hi all.

I am still attempting to extract a list of gene names from a microarray experiment entry in the Gene Expression Omnibus. Unfortunately, my efforts have been in vain. Third parties have suggested the following approach:

library(GEOquery)

gse <- getGEO("GSE12657")
length(gse)
gse <- gse[[1]]

pData(gse) # Display the sample information
fData(gse) # Display the gene annotation data
exprs(gse)[1,] # Display the expression data

What I am desperately seeking is a list of gene names (e.g., BRCA1, etc) that I could then further examine. According to the paper I read, the authors mentioned 702 differentially expressed genes in this specific microarray experiment, but I cannot seem to locate any of them. I emailed the authors, but I did not receive a response.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5661398/

Note: I am now receiving the following error:

Error: object 'gse' not found

Any help in resolving this issue would be greatly appreciated!

Thank you.

cancer Bioconductor R microarray GEOquery • 1.0k views
ADD COMMENT
2
Entering edit mode
2.6 years ago

Hi Caitlin,

For this study, the uploaded data is normalised via GC-RMA, but is not log [base 2] (log2) transformed.

To retrieve it, you need to do:

library(GEOquery)
gset <- getGEO('GSE12657', GSEMatrix = TRUE, getGPL= FALSE)
if (length(gset) > 1) idx <- grep('GPL8300', attr(gset, 'names')) else idx <- 1
gset <- gset[[idx]]
gset <- log2(exprs(gset))

We can see some of the probe names:

head(rownames(gset))
[1] "1000_at"   "1001_at"   "1002_f_at" "1003_s_at" "1004_at"   "1005_at"

We can get the equivalent Ensembl and HGNC IDs for these:

require(hgu95av2.db)
annotLookup <- select(hgu95av2.db, keys = rownames(gset),
  columns = c('PROBEID', 'ENSEMBL', 'SYMBOL'))

# remove probes with any NA mapping, and duplicates
  annotLookup <- annotLookup[!is.na(annotLookup$ENSEMBL) & !is.na(annotLookup$SYMBOL),]
  annotLookup <- annotLookup[!duplicated(annotLookup$PROBEID),]

head(annotLookup)
    PROBEID         ENSEMBL  SYMBOL
1   1000_at ENSG00000102882   MAPK3
2   1001_at ENSG00000066056    TIE1
3 1002_f_at ENSG00000165841 CYP2C19
4 1003_s_at ENSG00000160683   CXCR5
5   1004_at ENSG00000160683   CXCR5
6   1005_at ENSG00000120129   DUSP1

rownames(gset) will not be perfectly aligned with annotLookup; so, watch out for that.

If you want a few lines to align these:

# look up the ensembl ID and gene symbol
  gset <- gset[which(rownames(gset) %in% annotLookup$PROBEID),]
  annotLookup <- annotLookup[which(annotLookup$PROBEID %in% rownames(gset)),]

  annotLookup <- annotLookup[match(rownames(gset), annotLookup$PROBEID),]

  all(rownames(gset) == annotLookup$PROBEID)

Kevin

ADD COMMENT
1
Entering edit mode

Thank you Kevin! I greatly appreciate your help.

Thanks again.

~Caitlin

ADD REPLY

Login before adding your answer.

Traffic: 2950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6