Question

Reading and normalizing raw data on R

0

Entering edit mode

2.8 years ago

Fate • 0

I want to normalize the data series GSE36040 using R.

I have downloaded the data using the code: getGEOSuppFiles("GSE36040")

the result is a zip file entitled GSE36040_RAW.tar, I unzipped it, so it contains the zipped files of the samples (for example GSM879567_US22502590_251524110570_S01_GE1-v5_95_Feb07_1_2.txt.gz).

I want to read the data in R and normalize it, the readAffy() function gives back this error:

No cel filennames specified and no cel files in specified directory.

I don't know how can I convert my files to CEL files.

I also have used the untar function but it also gives this warning message:

In untar("GSE36040/GSE36040_RAW.tar", exdir = "data") :
‘tar.exe -xf "GSE36040/GSE36040_RAW.tar" -C "data"’ returned error code 1

I would be grateful if someone can guide me on how to open and read my data in R.

GEO normalization R • 1.4k views

ADD COMMENT • link 2.7 years ago by Fate • 0

score 1 · Answer 1 · 2021-07-19

1

Entering edit mode

2.8 years ago

seidel 11k

You won't be able to analyze that data set with readAffy() because that data is not Affymetrix data, and therefore there are no CEL files. It is Agilent array data. Those files have Agilent names, and are files representing Agilent array scans. You can still analyze it in R, you might try the limma library.

ADD COMMENT • link 2.8 years ago by seidel 11k

0

Entering edit mode

Thank you for your response. I have downloaded the Limma's user guide. in the section 4.5 Reading Single-Channel Agilent Intensity Data, there is a function x <- read.maimages(files, source="agilent", green.only=TRUE) and files referred to as: a character vector containing the names of the image analysis output fi?les. I didn't quite catch it. The file I downloaded is GSE36040_RAW.tar with the zipped GSMs within. How can I make a vector with all of these GSE zipped files? I mean how can I concatenate them together?

ADD REPLY • link 2.7 years ago by Fate • 0

0

Entering edit mode

You uncompress the GSM files (GSM879567_US22502590_251524110570_S01_GE1-v5_95_Feb07_1_2.txt.gz) so that they are text files. Then you can read them in one by one with the read.maimages() function. Depending on how you plan to analyze the data, you could create a list of data sets using an apply function (lapply for instance). If you had a directory listing of all the uncompressed GSMnnn_nnn_etc.txt files in a vector, you could treat is as a list: x <- lapply(as.list(GSM_files), read.maimages, source="agilent", green.only=TRUE). (I think that will work). Or you could use a for loop. Then you can pull out the columns of interest and consolidate them somehow.

ADD REPLY • link 2.7 years ago by seidel 11k

0

Entering edit mode

Thank you so much. so I first made a list from the text files using list.files function: my_data= list.files(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE) . then I ran the lapply function: data <- lapply(as.list(my_data),read.maimages, source="agilent", green.only=TRUE). it successfully read the text files as it's appearing in the console. then I performed the background correction and normalize between arrays as followed:

data_1 = backgroundCorrect(data, method = "normexp")


normalized_data = normalizeBetweenArrays(data, method="quantile")

but I get this error for the 1st one:

Array 1Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic

I thought maybe if I remove the as list part, this error may be solved but it didn't. I did a google search and it's said that maybe the list is constituted by one element which is not my case.

would you kindly look at my code and tell me in which part I'm mistaken?

ADD REPLY • link 2.7 years ago by Fate • 0