How to annotate probes to a GEO serie matrix?
16 months ago

Hello!,

I am working with a GEO series matrix file (ID > GSE48452, Platform GPL11532) that corresponds to HuGene-1_1st Affymetrix Human Gene 1.1 ST array. I want to have the probes with the annotations, for example: Gene Symbol in order to create a data table like the following:

                 Sample1       Sample2  Sample3 Sample4 Sampl5
#CLASS:CANCER   case    case    case    case
#CLASS:SEX  F   F   M   M   F   M   F   M
Gene Symbol
Gene1           -3.06 -2.25   -1.15   -6.64   0.4
Gene2          -1.36    -0.67   -0.17   -0.97   -2.0
Gene3           1.61    -0.27    0.71        -0.62  0.14
Gene4           0.93    1.29           -0.23          -0.74          -2


How can map the probes with the gene symbol mantaining the order?

I was using the following Rscript without sucess, I don't know how to proceed........

getGEOdataObjects <- function(x, getGSEobject=FALSE){
# Make sure the GEOquery package is installed
require("GEOquery")
# Use the getGEO() function to download the GEO data for the id stored in x
GSEDATA <- getGEO(x, GSEMatrix=T, AnnotGPL=FALSE)
# Inspect the object by printing a summary of the expression values for the first 2 columns
print(summary(exprs(GSEDATA[[1]])[, 1:2]))

# Get the eset object
eset <- GSEDATA[[1]]
# Save the objects generated for future use in the current working directory
save(GSEDATA, eset, file=paste(x, ".RData", sep=""))

# check whether we want to return the list object we downloaded on GEO or
# just the eset object with the getGSEobject argument
if(getGSEobject) return(GSEDATA) else return(eset)
}
# Store the dataset ids in a vector GEO_DATASETS just in case you want to loop through several GEO ids
GEO_DATASETS <- c("GSE48452")

# Use the function we created to return the eset object
eset <- getGEOdataObjects(GEO_DATASETS[1])
# Inspect the eset object to get the annotation GPL id
eset
# Get the annotation GPL id (see Annotation: GPL10558)
gpl <- getGEO('GPL11532', destdir=".")
Meta(gpl)$title # Inspect the table of the gpl annotation object colnames(Table(gpl)) # Get the gene symbol and entrez ids to be used for annotations Table(gpl)[1:10, c(1, 2, 6, 12)] dim(Table(gpl)) # Get the gene expression data for all the probes with a gene symbol geneProbes <- which(!is.na(Table(gpl)$Symbol))
probeids <- as.character(Table(gpl)$ID[geneProbes]) probes <- intersect(probeids, rownames(exprs(eset))) length(probes) geneMatrix <- exprs(eset)[probes, ] inds <- which(Table(gpl)$ID %in% probes)
# Check you get the same probes

# Create the expression matrix with gene ids
geneMatTable <- cbind(geneMatrix, Table(gpl)[inds, c(1, 2, 6, 12)])

# Save a copy of the expression matrix as a csv file
write.csv(geneMatTable, paste(GEO_DATASETS[1], "_DataMatrix.csv", sep=""), row.names=T)


Thank you!

16 months ago
MatthewP ▴ 920

I recommend use left_join function from R package dplyr.