Question: Microarray data analysis - pipeline (problem with custom CDF)
gravatar for lessismore
3.1 years ago by
lessismore890 wrote:

Hello everybody,

i am analyzing for microarray data in Arabidopsis. I took advantage from the code here supplied (sharing some naive codes for microarray normalization in R with whom are too new in R alike me) but im ending up with a different number of analysed genes in my final expression file.

Then i got that the problem is that it has been used a custom CDF file. Could someone tell me how could i include the custom CDF file (supplied in ArrayExpress) in my analysis? here is the dataset i am looking at:

here is the code i used:


# To read all CEL files in the working directory:
Data <- ReadAffy()

eset <- rma(Data) <- exprs(eset)

# The R object contains the normalized expression for every probeset in the ATH1 microarrays used in this example. In order to convert the probeset IDs to Arabidopsis gene identifiers, the file download from the TAIR database and place in the folder with the microarray data. In order to avoid ambiguous probeset associations (i.e. probesets that have multiple matches to genes), we only used probes that match only one gene in the Arabidopsis genome.
affy_names <- read.delim("affy_ATH1_array_elements-2010-12-20.txt",header=T)

# Select the columns that contain the probeset ID and corresponding AGI number. Please note that the positions used to index the matrix depend on the input format of the array elements file. You can change these numbers to index the corresponding columns if you are using a different format:
probe_agi <- as.matrix(affy_names[,c(1,5)])

# To associate the probeset with the corresponding AGI locus:
normalized.names <-merge(probe_agi,,by.x=1,by.y=0)[,-1]

# To remove probesets that do not match the Arabidopsis genome:
normalized.arabidopsis <- normalized.names[grep("AT",normalized.names[,1]),]

# To remove ambiguous probes:
normalized.arabidopsis.unambiguous <- normalized.arabidopsis[grep(pattern="",normalized.arabidopsis[,1], invert=T),]

# In some cases, multiple probes match the same gene, due to updates in the annotation of the genome. To remove duplicated genes in the matrix: <- normalized.arabidopsis.unambiguous[!duplicated(normalized.arabidopsis.unambiguous[,1]),]

# To assign the AGI number as row name:
rownames( <-[,1] <-[,-1]

#The resulting gene expression dataset contains unique row identifies (i.e. AGI locus), and different expression values obtained from different experiments on each column

# To export this data matrix from R to a tab-delimited file use the following command. The file will be written to the folder that you set up as your working directory in R using the setwd() command in line 1 above:
write.table (,"RMA.txt", sep="\t",col.names=NA,quote=F)
ADD COMMENTlink modified 3.1 years ago by mannoulag160 • written 3.1 years ago by lessismore890

Hi, I used your code to analyse my dataset then I want to see the final matrix but I have an error

Error in View( : argument 'x' incorrect
ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by mannoulag160

try to export and reimport it.

ADD REPLYlink written 3.1 years ago by lessismore890

yes I do it but I have 0 rows. to show the data, I do:


    [1] locus            GSM131576.CEL.gz GSM131577.CEL.gz GSM131578.CEL.gz GSM131579.CEL.gz GSM131580.CEL.gz
     [7] GSM131581.CEL.gz GSM131582.CEL.gz GSM131583.CEL.gz GSM131584.CEL.gz GSM131585.CEL.gz GSM131586.CEL.gz
    [13] GSM131587.CEL.gz GSM131588.CEL.gz GSM131589.CEL.gz GSM131590.CEL.gz GSM131591.CEL.gz GSM131592.CEL.gz
    [19] GSM131593.CEL.gz GSM131594.CEL.gz GSM131595.CEL.gz GSM131596.CEL.gz GSM131597.CEL.gz GSM131598.CEL.gz
    [25] GSM131599.CEL.gz GSM131600.CEL.gz GSM131601.CEL.gz GSM131602.CEL.gz GSM131603.CEL.gz GSM131604.CEL.gz
    [31] GSM131605.CEL.gz GSM131606.CEL.gz GSM131607.CEL.gz GSM131608.CEL.gz GSM131609.CEL.gz GSM131610.CEL.gz
    [37] GSM131611.CEL.gz GSM131612.CEL.gz GSM131613.CEL.gz GSM131614.CEL.gz GSM131615.CEL.gz GSM131616.CEL.gz
    [43] GSM131617.CEL.gz GSM131618.CEL.gz GSM131619.CEL.gz GSM131620.CEL.gz GSM131621.CEL.gz GSM131622.CEL.gz
    [49] GSM131623.CEL.gz GSM131624.CEL.gz GSM131625.CEL.gz GSM131626.CEL.gz GSM131627.CEL.gz GSM131628.CEL.gz
    [55] GSM131629.CEL.gz GSM131630.CEL.gz GSM131631.CEL.gz GSM131632.CEL.gz GSM131633.CEL.gz GSM131634.CEL.gz
    [61] GSM131635.CEL.gz GSM131636.CEL.gz GSM131637.CEL.gz GSM131638.CEL.gz GSM131639.CEL.gz GSM131640.CEL.gz
    [67] GSM131641.CEL.gz
    <0 lignes> (ou 'row.names' de longueur nulle)
ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by mannoulag160

That one was the code i initially used because i had not a custom cdf, then i found it. So my pipeline was:

# load the cdf package in your R environment 
# make sure you set your working directory where you have all CEL files

library (Affy)

# In my case i have the custom CDF so..
Data <- ReadAffy(cdfname = "cdfname")

# Normalization
eset <- rma(Data) <- exprs(eset)

# Export Data
write.table (, "", sep="\t",col.names=NA,quote=F)
ADD REPLYlink written 3.1 years ago by lessismore890
gravatar for lessismore
3.1 years ago by
lessismore890 wrote:

Solved myself.

Data <- ReadAffy(cdfname = "yourcdffilename")

Important: your cdf object needs to be present in your R environment. Normally there is a package to install to do it.

ADD COMMENTlink written 3.1 years ago by lessismore890
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1504 users visited in the last hour