hgu133a hgu133b hgu133plus2 hgu95av2

Question

how I can read CEL files with affy

3

Entering edit mode

7.5 years ago

zizigolu ★ 4.3k

hi,

I am trying to read my CEL files from [Mouse430_2] Affymetrix Mouse Genome 430 2.0 Array but I get this error

library(affy)

Data<-ReadAffy()

Error in affyio::read_abatch(filenames, rm.mask, rm.outliers, rm.extra, : Cel file C:/Users/Lenovo/Desktop/GSE50833_RAW/GSE10000_RAW/GSM44660.CEL/GSM252007.CEL does not seem to have the correct dimensions

celfiles<- list.files("GSE10000/CEL", full = TRUE)

rawData<- read.celfiles(celfiles)

All the CEL files must be of the same type. Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE

how to read my CEL files for normalization?

R software error • 15k views

ADD COMMENT • link updated 4.1 years ago by Biostar 20 • written 7.5 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Hi, F

It seems that it is a dimension problem.

ADD REPLY • link 7.5 years ago by Farbod ★ 3.4k

0

Entering edit mode

table(sapply(celf2, annotation))

hgu133a hgu133b hgu133plus2 hgu95av2

306 115 349 49

library(affy)
hgu133a <- ReadAffy(filenames = celf2$hgu133a)

Error in affyio::read_abatch(filenames, rm.mask, rm.outliers, rm.extra, : Cel file /Volumes/新加卷 1/GBM_cel/GSE13041/GSM326852.CEL.gz does not seem to have the correct dimensions

ADD REPLY • link 4.3 years ago by jianguo.zhou • 0

0

Entering edit mode

Please read the accepted answer. This should cover your problem. Check if there multiple different array types you are trying to load. If this does not help, please open a new question.

ADD REPLY • link 4.3 years ago by ATpoint 82k

score 7 · Accepted Answer · 2016-10-30

7

Entering edit mode

7.5 years ago

ddiez ★ 2.0k

The code you show is intriguing because you have in the error message a reference to the GSE10000 dataset, which is indeed Mouse430_2 arrays, but also to GSE50833, which are Agilent-028005 SurePrint G3 arrays. At any rate, the error suggests that you are trying to read different arrays with ReadAffy() and that fails because different arrays have different dimensions. First thing I would try myself is to make sure that all the files are of the same platform/array.

EDIT

I could replicate the problem and confirm my guess using the following experiment (there must be a better way to do this than reading the whole set of files one by one):

f <- list.files(pattern = "CEL.gz")
celf <- lapply(f, function(x) ReadAffy(filenames = x))
table(sapply(celf, annotation))
 mouse4302 mouse430a2 
        18         17

The solution is to read them separately.

EDIT 2

OK, this is the most effective (fast) way to check the chip type of a bunch of cel files:

library(affyio)
f <- list.files(pattern = "CEL.gz")
table(sapply(f, function(x) read.celfile.header(x)$cdfName))
 Mouse430_2 Mouse430A_2 
         18          17

EDIT 3

And this is how you can use the information above to read the files in different batches:

ff <- split(f, sapply(f, function(x) read.celfile.header(x)$cdfName))
ff
$Mouse430_2
 [1] "GSM250879.CEL.gz" "GSM250880.CEL.gz" "GSM250881.CEL.gz" "GSM250882.CEL.gz" "GSM250919.CEL.gz" "GSM250920.CEL.gz"
 [7] "GSM250922.CEL.gz" "GSM250923.CEL.gz" "GSM250925.CEL.gz" "GSM250927.CEL.gz" "GSM250928.CEL.gz" "GSM250943.CEL.gz"
[13] "GSM44658.CEL.gz"  "GSM44659.CEL.gz"  "GSM44660.CEL.gz"  "GSM44661.CEL.gz"  "GSM44662.CEL.gz"  "GSM44663.CEL.gz" 

$Mouse430A_2
 [1] "GSM252007.CEL.gz" "GSM252008.CEL.gz" "GSM252009.CEL.gz" "GSM252010.CEL.gz" "GSM252011.CEL.gz" "GSM252014.CEL.gz"
 [7] "GSM252015.CEL.gz" "GSM252016.CEL.gz" "GSM252017.CEL.gz" "GSM252018.CEL.gz" "GSM252021.CEL.gz" "GSM252022.CEL.gz"
[13] "GSM252033.CEL.gz" "GSM252040.CEL.gz" "GSM252051.CEL.gz" "GSM252052.CEL.gz" "GSM252053.CEL.gz"

library(affy)
abatch1 <- ReadAffy(filenames = ff$Mouse430_2)
abatch2 <- ReadAffy(filenames = ff$Mouse430A_2)

And so on.

ADD COMMENT • link 7.5 years ago by ddiez ★ 2.0k

2

Entering edit mode

I think "GSE50833_RAW/GSE10000_RAW/" is just an inappropriate folder naming.

ADD REPLY • link 7.5 years ago by Farbod ★ 3.4k

1

Entering edit mode

Absolutely. Better to keep each dataset in its own folder.

ADD REPLY • link 7.5 years ago by ddiez ★ 2.0k

0

Entering edit mode

thank you, but GSE50833_RAW is name of my folder in which GSE10000 located :( :( :(

ADD REPLY • link 7.5 years ago by zizigolu ★ 4.3k

1

Entering edit mode

I see. But nothing prevents you from moving the folder to its own location, right? Not trying to impose my own logic about file organization (mainly because many times it is far from perfect or rational) but, in this particular case, I would keep them datasets in different folders.

ADD REPLY • link 7.5 years ago by ddiez ★ 2.0k

0

Entering edit mode

thank you

f <- list.files(pattern = "CEL.gz")

celf <- lapply(f, function(x) ReadAffy(filenames = x))

table(sapply(cdfs, annotation))

Error in sapply(cdfs, annotation) : object 'cdfs' not found

eset<-rma(celf)

Error in (function (classes, fdef, mtable) :

unable to find an inherited method for function ‘rma’ for signature ‘"list"’

Data<-ReadAffy()

Error in affyio::read_abatch(filenames, rm.mask, rm.outliers, rm.extra, :

Cel file C:/Users/Lenovo/Desktop/GSE10000_RAW/GSM252007.CEL.gz does not seem to have the correct dimensions

ADD REPLY • link 7.5 years ago by zizigolu ★ 4.3k

2

Entering edit mode

Sorry, I made a last moment change in variable name without checking if it worked and that is always a bad idea. Instead of cdfs you have to have celf. Anyway, I added a better way to do the same thing without having to read all the files (which is way more efficient if you have lots of files).

ADD REPLY • link 7.5 years ago by ddiez ★ 2.0k

0

Entering edit mode

thank you,

your second edition passed without error

library(affyio)

f <- list.files(pattern = "CEL.gz")

table(sapply(f, function(x) read.celfile.header(x)$cdfName))

Mouse430_2 Mouse430A_2 18 17

sorry, hereafter how I can carry on normalization?

I want to run

Data<-ReadAffy()

eset<-rma(Data)

but I can't figure out how to relate table with readaffy

ADD REPLY • link 7.5 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Well, it is difficult to answer. I would use the information above about the different platform to read the two sets of files separately (i.e. two calls to ReadAffy) and process them independently (two calls to rma). Then you will have to figure out how to combine them. Maybe extract the matrices and put them together with missing values for the non-matching probesets? It is possible but not sure whether is a good idea. But, anyway, the first thing I would do is to find out how the original authors did it. The dataset (GSE10000) has been published and in the paper they may say something about the two different platforms.

ADD REPLY • link 7.5 years ago by ddiez ★ 2.0k

1

Entering edit mode

Added some extra help in my answer regarding how to use the information about the platform to read the files.

ADD REPLY • link 7.5 years ago by ddiez ★ 2.0k

0

Entering edit mode

thank you, since yesterday I got confused finally you clarified the source of error

ADD REPLY • link 7.5 years ago by zizigolu ★ 4.3k