i am analyzing for microarray data in Arabidopsis. I took advantage from the code here supplied (sharing some naive codes for microarray normalization in R with whom are too new in R alike me) but im ending up with a different number of analysed genes in my final expression file.
Then i got that the problem is that it has been used a custom CDF file. Could someone tell me how could i include the custom CDF file (supplied in ArrayExpress) in my analysis? here is the dataset i am looking at: https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-5632/files/
here is the code i used:
library(affy) # To read all CEL files in the working directory: Data <- ReadAffy() eset <- rma(Data) norm.data <- exprs(eset) # The norm.data R object contains the normalized expression for every probeset in the ATH1 microarrays used in this example. In order to convert the probeset IDs to Arabidopsis gene identifiers, the file ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2010-12-20.txt download from the TAIR database and place in the folder with the microarray data. In order to avoid ambiguous probeset associations (i.e. probesets that have multiple matches to genes), we only used probes that match only one gene in the Arabidopsis genome. affy_names <- read.delim("affy_ATH1_array_elements-2010-12-20.txt",header=T) # Select the columns that contain the probeset ID and corresponding AGI number. Please note that the positions used to index the matrix depend on the input format of the array elements file. You can change these numbers to index the corresponding columns if you are using a different format: probe_agi <- as.matrix(affy_names[,c(1,5)]) # To associate the probeset with the corresponding AGI locus: normalized.names <-merge(probe_agi,norm.data,by.x=1,by.y=0)[,-1] # To remove probesets that do not match the Arabidopsis genome: normalized.arabidopsis <- normalized.names[grep("AT",normalized.names[,1]),] # To remove ambiguous probes: normalized.arabidopsis.unambiguous <- normalized.arabidopsis[grep(pattern="",normalized.arabidopsis[,1], invert=T),] # In some cases, multiple probes match the same gene, due to updates in the annotation of the genome. To remove duplicated genes in the matrix: normalized.agi.final <- normalized.arabidopsis.unambiguous[!duplicated(normalized.arabidopsis.unambiguous[,1]),] # To assign the AGI number as row name: rownames(normalized.agi.final) <- normalized.agi.final[,1] normalized.agi.final <- normalized.agi.final[,-1] #The resulting gene expression dataset contains unique row identifies (i.e. AGI locus), and different expression values obtained from different experiments on each column # To export this data matrix from R to a tab-delimited file use the following command. The file will be written to the folder that you set up as your working directory in R using the setwd() command in line 1 above: write.table (normalized.agi.final,"RMA.txt", sep="\t",col.names=NA,quote=F)