Question: Question on performing RMA normalization
0
gravatar for Natasha
7 months ago by
Natasha40
Natasha40 wrote:

I am following the tutorial given here to perform RMA normalization of dataset GSE1133 using the following code in R. cdf libraries of platform gpl1073 and gpl1074 have been installed following the description given here

library(Biobase)
library(GEOquery)
library(affy)
library(limma)
library(hgu133a.db)
library(hgu133acdf)
library(gpl1073cdf) 
library(gpl1074cdf)

# Filter samples of platform 1073
eset <- getGEO("GSE1133", GSEMatrix=TRUE, getGPL=TRUE)
if(length(eset) > 1)
idx <- grep("GPL1073", attr(eset, "names")) else idx <- 1
eset <- eset[[idx]]
phenoData(eset)
files <- sampleNames(eset) # only cel files of platform GPL1073

setwd("/home/GSE1133")

#Download data
getGEOSuppFiles("GSE1133")
setwd("/home/GSE1133/GSE1133")
untar("GSE1133_RAW.tar", exdir="data")
cels = list.files("data/", pattern="CEL")
pattern <- paste(files, sep="", collapse="|")
cels <- grep(pattern, cels, value=TRUE)

# RMA normalization
raw.data = ReadAffy(verbose=FALSE, filenames=cels, cdfname="gpl1073cdf")
data.rma.norm = rma(raw.data)

However, after running this code I get an error that says some of the cel files are not valid. Could someone look into this? I am accessing the files that are downloaded using GEOquery and I am not sure why this error occurs.

Error: the following are not valid files:
    GSM18584.CEL.gz
   GSM18585.CEL.gz
            :
            :
microarray rma • 431 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by Natasha40
1
gravatar for mathavanbioinfo
7 months ago by
India
mathavanbioinfo50 wrote:

hi Unzip and try it again

ADD COMMENTlink written 7 months ago by mathavanbioinfo50

Hi I tried to unzip using (sapply(paste("data", cels, sep = "/"), gunzip) and run the code again. Get the same error for some reason

Error: the following are not valid files:
    GSM18584.CEL
   GSM18585.CEL
   GSM18586.CEL
ADD REPLYlink written 7 months ago by Natasha40
1

Additionally, I do not think your current working directory is correct.

Your current working directory is: /home/GSE1133/GSE1133 but your CEL files are stored in the data directory.

However, when you run cels = list.files("data/", pattern="CEL"), the cels variable will only containing the filenames, e.g. GSM18584.CEL rather than data/GSM18584.CEL

Why don't you try the following instead: raw.data = ReadAffy(verbose=FALSE, filenames=paste("data", cels, sep = "/"), cdfname="gpl1073cdf")

ADD REPLYlink modified 7 months ago • written 7 months ago by dsull1.2k

Hi, Thanks a lot. That was my mistake. I could run the code and perform normalization

Background correcting
Normalizing
Calculating Expression
                            GSM18584.CEL GSM18585.CEL GSM18586.CEL GSM18587.CEL
AFFX-18SRNAMur/X00686_3_at     10.324639    10.309749     7.978267     7.784038
AFFX-18SRNAMur/X00686_5_at      9.080051     9.401111     5.540294     5.539700

However, I am not sure how to map the probe ids to gene ids.

The data is from platform https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL1073

Any suggestion on how the probes can be mapped to gene ids?

ADD REPLYlink written 7 months ago by Natasha40

That is a separate question, but the annotation is available at the link that you have provided. Click on Download full table... - the gene names can be found in that file, but you will have to parse them out.

ADD REPLYlink written 7 months ago by Kevin Blighe60k

@Kevin Blighe Thanks a lot for the response. I created a new post for this question here.

ADD REPLYlink written 7 months ago by Natasha40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2200 users visited in the last hour