Question: Microarray data analysis - pipeline (problem with custom CDF)
gravatar for lessismore
24 months ago by
lessismore670 wrote:

Hello everybody,

i am analyzing for microarray data in Arabidopsis. I took advantage from the code here supplied (sharing some naive codes for microarray normalization in R with whom are too new in R alike me) but im ending up with a different number of analysed genes in my final expression file.

Then i got that the problem is that it has been used a custom CDF file. Could someone tell me how could i include the custom CDF file (supplied in ArrayExpress) in my analysis? here is the dataset i am looking at:

here is the code i used:


# To read all CEL files in the working directory:
Data <- ReadAffy()

eset <- rma(Data) <- exprs(eset)

# The R object contains the normalized expression for every probeset in the ATH1 microarrays used in this example. In order to convert the probeset IDs to Arabidopsis gene identifiers, the file download from the TAIR database and place in the folder with the microarray data. In order to avoid ambiguous probeset associations (i.e. probesets that have multiple matches to genes), we only used probes that match only one gene in the Arabidopsis genome.
affy_names <- read.delim("affy_ATH1_array_elements-2010-12-20.txt",header=T)

# Select the columns that contain the probeset ID and corresponding AGI number. Please note that the positions used to index the matrix depend on the input format of the array elements file. You can change these numbers to index the corresponding columns if you are using a different format:
probe_agi <- as.matrix(affy_names[,c(1,5)])

# To associate the probeset with the corresponding AGI locus:
normalized.names <-merge(probe_agi,,by.x=1,by.y=0)[,-1]

# To remove probesets that do not match the Arabidopsis genome:
normalized.arabidopsis <- normalized.names[grep("AT",normalized.names[,1]),]

# To remove ambiguous probes:
normalized.arabidopsis.unambiguous <- normalized.arabidopsis[grep(pattern="",normalized.arabidopsis[,1], invert=T),]

# In some cases, multiple probes match the same gene, due to updates in the annotation of the genome. To remove duplicated genes in the matrix: <- normalized.arabidopsis.unambiguous[!duplicated(normalized.arabidopsis.unambiguous[,1]),]

# To assign the AGI number as row name:
rownames( <-[,1] <-[,-1]

#The resulting gene expression dataset contains unique row identifies (i.e. AGI locus), and different expression values obtained from different experiments on each column

# To export this data matrix from R to a tab-delimited file use the following command. The file will be written to the folder that you set up as your working directory in R using the setwd() command in line 1 above:
write.table (,"RMA.txt", sep="\t",col.names=NA,quote=F)
ADD COMMENTlink modified 23 months ago by mannoulag160 • written 24 months ago by lessismore670

Hi, I used your code to analyse my dataset then I want to see the final matrix but I have an error

Error in View( : argument 'x' incorrect
ADD REPLYlink modified 23 months ago • written 23 months ago by mannoulag160

try to export and reimport it.

ADD REPLYlink written 23 months ago by lessismore670

yes I do it but I have 0 rows. to show the data, I do:


    [1] locus            GSM131576.CEL.gz GSM131577.CEL.gz GSM131578.CEL.gz GSM131579.CEL.gz GSM131580.CEL.gz
     [7] GSM131581.CEL.gz GSM131582.CEL.gz GSM131583.CEL.gz GSM131584.CEL.gz GSM131585.CEL.gz GSM131586.CEL.gz
    [13] GSM131587.CEL.gz GSM131588.CEL.gz GSM131589.CEL.gz GSM131590.CEL.gz GSM131591.CEL.gz GSM131592.CEL.gz
    [19] GSM131593.CEL.gz GSM131594.CEL.gz GSM131595.CEL.gz GSM131596.CEL.gz GSM131597.CEL.gz GSM131598.CEL.gz
    [25] GSM131599.CEL.gz GSM131600.CEL.gz GSM131601.CEL.gz GSM131602.CEL.gz GSM131603.CEL.gz GSM131604.CEL.gz
    [31] GSM131605.CEL.gz GSM131606.CEL.gz GSM131607.CEL.gz GSM131608.CEL.gz GSM131609.CEL.gz GSM131610.CEL.gz
    [37] GSM131611.CEL.gz GSM131612.CEL.gz GSM131613.CEL.gz GSM131614.CEL.gz GSM131615.CEL.gz GSM131616.CEL.gz
    [43] GSM131617.CEL.gz GSM131618.CEL.gz GSM131619.CEL.gz GSM131620.CEL.gz GSM131621.CEL.gz GSM131622.CEL.gz
    [49] GSM131623.CEL.gz GSM131624.CEL.gz GSM131625.CEL.gz GSM131626.CEL.gz GSM131627.CEL.gz GSM131628.CEL.gz
    [55] GSM131629.CEL.gz GSM131630.CEL.gz GSM131631.CEL.gz GSM131632.CEL.gz GSM131633.CEL.gz GSM131634.CEL.gz
    [61] GSM131635.CEL.gz GSM131636.CEL.gz GSM131637.CEL.gz GSM131638.CEL.gz GSM131639.CEL.gz GSM131640.CEL.gz
    [67] GSM131641.CEL.gz
    <0 lignes> (ou 'row.names' de longueur nulle)
ADD REPLYlink modified 23 months ago • written 23 months ago by mannoulag160

That one was the code i initially used because i had not a custom cdf, then i found it. So my pipeline was:

# load the cdf package in your R environment 
# make sure you set your working directory where you have all CEL files

library (Affy)

# In my case i have the custom CDF so..
Data <- ReadAffy(cdfname = "cdfname")

# Normalization
eset <- rma(Data) <- exprs(eset)

# Export Data
write.table (, "", sep="\t",col.names=NA,quote=F)
ADD REPLYlink written 23 months ago by lessismore670
gravatar for lessismore
24 months ago by
lessismore670 wrote:

Solved myself.

Data <- ReadAffy(cdfname = "yourcdffilename")

Important: your cdf object needs to be present in your R environment. Normally there is a package to install to do it.

ADD COMMENTlink written 24 months ago by lessismore670
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1442 users visited in the last hour