Question

Question on performing RMA normalization

0

Entering edit mode

5.7 years ago

Natasha ▴ 40

I am following the tutorial given here to perform RMA normalization of dataset GSE1133 using the following code in R. cdf libraries of platform gpl1073 and gpl1074 have been installed following the description given here

library(Biobase)
library(GEOquery)
library(affy)
library(limma)
library(hgu133a.db)
library(hgu133acdf)
library(gpl1073cdf) 
library(gpl1074cdf)

# Filter samples of platform 1073
eset <- getGEO("GSE1133", GSEMatrix=TRUE, getGPL=TRUE)
if(length(eset) > 1)
idx <- grep("GPL1073", attr(eset, "names")) else idx <- 1
eset <- eset[[idx]]
phenoData(eset)
files <- sampleNames(eset) # only cel files of platform GPL1073

setwd("/home/GSE1133")

#Download data
getGEOSuppFiles("GSE1133")
setwd("/home/GSE1133/GSE1133")
untar("GSE1133_RAW.tar", exdir="data")
cels = list.files("data/", pattern="CEL")
pattern <- paste(files, sep="", collapse="|")
cels <- grep(pattern, cels, value=TRUE)

# RMA normalization
raw.data = ReadAffy(verbose=FALSE, filenames=cels, cdfname="gpl1073cdf")
data.rma.norm = rma(raw.data)

However, after running this code I get an error that says some of the cel files are not valid. Could someone look into this? I am accessing the files that are downloaded using GEOquery and I am not sure why this error occurs.

Error: the following are not valid files:
    GSM18584.CEL.gz
   GSM18585.CEL.gz
            :
            :

RMA microarray • 3.6k views

ADD COMMENT • link 5.7 years ago by Natasha ▴ 40

score 1 · Answer 1 · 2019-10-18

1

Entering edit mode

5.7 years ago

mathavanbioinfo ▴ 80

hi Unzip and try it again

ADD COMMENT • link 5.7 years ago by mathavanbioinfo ▴ 80

0

Entering edit mode

Hi I tried to unzip using (sapply(paste("data", cels, sep = "/"), gunzip) and run the code again. Get the same error for some reason

Error: the following are not valid files:
    GSM18584.CEL
   GSM18585.CEL
   GSM18586.CEL

ADD REPLY • link 5.7 years ago by Natasha ▴ 40

1

Entering edit mode

Additionally, I do not think your current working directory is correct.

Your current working directory is: /home/GSE1133/GSE1133 but your CEL files are stored in the data directory.

However, when you run cels = list.files("data/", pattern="CEL"), the cels variable will only containing the filenames, e.g. GSM18584.CEL rather than data/GSM18584.CEL

Why don't you try the following instead: raw.data = ReadAffy(verbose=FALSE, filenames=paste("data", cels, sep = "/"), cdfname="gpl1073cdf")

ADD REPLY • link 5.7 years ago by dsull ★ 7.6k

0

Entering edit mode

Hi, Thanks a lot. That was my mistake. I could run the code and perform normalization

Background correcting
Normalizing
Calculating Expression
                            GSM18584.CEL GSM18585.CEL GSM18586.CEL GSM18587.CEL
AFFX-18SRNAMur/X00686_3_at     10.324639    10.309749     7.978267     7.784038
AFFX-18SRNAMur/X00686_5_at      9.080051     9.401111     5.540294     5.539700

However, I am not sure how to map the probe ids to gene ids.

The data is from platform https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL1073

Any suggestion on how the probes can be mapped to gene ids?

ADD REPLY • link 5.7 years ago by Natasha ▴ 40

0

Entering edit mode

That is a separate question, but the annotation is available at the link that you have provided. Click on Download full table... - the gene names can be found in that file, but you will have to parse them out.

ADD REPLY • link 5.7 years ago by Kevin Blighe 89k

0

Entering edit mode

@Kevin Blighe Thanks a lot for the response. I created a new post for this question here.

ADD REPLY • link 5.7 years ago by Natasha ▴ 40