Question: Microarray data analysis - pipeline (problem with custom CDF)
0
gravatar for lessismore
15 months ago by
lessismore520
Mexico
lessismore520 wrote:

Hello everybody,

i am analyzing for microarray data in Arabidopsis. I took advantage from the code here supplied (sharing some naive codes for microarray normalization in R with whom are too new in R alike me) but im ending up with a different number of analysed genes in my final expression file.

Then i got that the problem is that it has been used a custom CDF file. Could someone tell me how could i include the custom CDF file (supplied in ArrayExpress) in my analysis? here is the dataset i am looking at: https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-5632/files/

here is the code i used:

library(affy)

# To read all CEL files in the working directory:
Data <- ReadAffy()

eset <- rma(Data)
norm.data <- exprs(eset)

# The norm.data R object contains the normalized expression for every probeset in the ATH1 microarrays used in this example. In order to convert the probeset IDs to Arabidopsis gene identifiers, the file ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2010-12-20.txt download from the TAIR database and place in the folder with the microarray data. In order to avoid ambiguous probeset associations (i.e. probesets that have multiple matches to genes), we only used probes that match only one gene in the Arabidopsis genome.
affy_names <- read.delim("affy_ATH1_array_elements-2010-12-20.txt",header=T)

# Select the columns that contain the probeset ID and corresponding AGI number. Please note that the positions used to index the matrix depend on the input format of the array elements file. You can change these numbers to index the corresponding columns if you are using a different format:
probe_agi <- as.matrix(affy_names[,c(1,5)])

# To associate the probeset with the corresponding AGI locus:
normalized.names <-merge(probe_agi,norm.data,by.x=1,by.y=0)[,-1]

# To remove probesets that do not match the Arabidopsis genome:
normalized.arabidopsis <- normalized.names[grep("AT",normalized.names[,1]),]

# To remove ambiguous probes:
normalized.arabidopsis.unambiguous <- normalized.arabidopsis[grep(pattern="",normalized.arabidopsis[,1], invert=T),]

# In some cases, multiple probes match the same gene, due to updates in the annotation of the genome. To remove duplicated genes in the matrix:
normalized.agi.final <- normalized.arabidopsis.unambiguous[!duplicated(normalized.arabidopsis.unambiguous[,1]),]

# To assign the AGI number as row name:
rownames(normalized.agi.final) <- normalized.agi.final[,1]
normalized.agi.final <- normalized.agi.final[,-1]

#The resulting gene expression dataset contains unique row identifies (i.e. AGI locus), and different expression values obtained from different experiments on each column

# To export this data matrix from R to a tab-delimited file use the following command. The file will be written to the folder that you set up as your working directory in R using the setwd() command in line 1 above:
write.table (normalized.agi.final,"RMA.txt", sep="\t",col.names=NA,quote=F)
ADD COMMENTlink modified 14 months ago by mannoulag120 • written 15 months ago by lessismore520

Hi, I used your code to analyse my dataset then I want to see the final matrix but I have an error

View(normalized.agi.final)
Error in View(normalized.agi.final) : argument 'x' incorrect
ADD REPLYlink modified 14 months ago • written 14 months ago by mannoulag120

try to export and reimport it.

ADD REPLYlink written 14 months ago by lessismore520

yes I do it but I have 0 rows. to show the data, I do:

> normalized.agi.final

    [1] locus            GSM131576.CEL.gz GSM131577.CEL.gz GSM131578.CEL.gz GSM131579.CEL.gz GSM131580.CEL.gz
     [7] GSM131581.CEL.gz GSM131582.CEL.gz GSM131583.CEL.gz GSM131584.CEL.gz GSM131585.CEL.gz GSM131586.CEL.gz
    [13] GSM131587.CEL.gz GSM131588.CEL.gz GSM131589.CEL.gz GSM131590.CEL.gz GSM131591.CEL.gz GSM131592.CEL.gz
    [19] GSM131593.CEL.gz GSM131594.CEL.gz GSM131595.CEL.gz GSM131596.CEL.gz GSM131597.CEL.gz GSM131598.CEL.gz
    [25] GSM131599.CEL.gz GSM131600.CEL.gz GSM131601.CEL.gz GSM131602.CEL.gz GSM131603.CEL.gz GSM131604.CEL.gz
    [31] GSM131605.CEL.gz GSM131606.CEL.gz GSM131607.CEL.gz GSM131608.CEL.gz GSM131609.CEL.gz GSM131610.CEL.gz
    [37] GSM131611.CEL.gz GSM131612.CEL.gz GSM131613.CEL.gz GSM131614.CEL.gz GSM131615.CEL.gz GSM131616.CEL.gz
    [43] GSM131617.CEL.gz GSM131618.CEL.gz GSM131619.CEL.gz GSM131620.CEL.gz GSM131621.CEL.gz GSM131622.CEL.gz
    [49] GSM131623.CEL.gz GSM131624.CEL.gz GSM131625.CEL.gz GSM131626.CEL.gz GSM131627.CEL.gz GSM131628.CEL.gz
    [55] GSM131629.CEL.gz GSM131630.CEL.gz GSM131631.CEL.gz GSM131632.CEL.gz GSM131633.CEL.gz GSM131634.CEL.gz
    [61] GSM131635.CEL.gz GSM131636.CEL.gz GSM131637.CEL.gz GSM131638.CEL.gz GSM131639.CEL.gz GSM131640.CEL.gz
    [67] GSM131641.CEL.gz
    <0 lignes> (ou 'row.names' de longueur nulle)
ADD REPLYlink modified 14 months ago • written 14 months ago by mannoulag120

That one was the code i initially used because i had not a custom cdf, then i found it. So my pipeline was:

# load the cdf package in your R environment 
# make sure you set your working directory where you have all CEL files

library (Affy)

# In my case i have the custom CDF so..
Data <- ReadAffy(cdfname = "cdfname")

# Normalization
eset <- rma(Data)
norm.data <- exprs(eset)

# Export Data
write.table (norm.data, "norm.data.txt", sep="\t",col.names=NA,quote=F)
ADD REPLYlink written 14 months ago by lessismore520
1
gravatar for lessismore
14 months ago by
lessismore520
Mexico
lessismore520 wrote:

Solved myself.

Data <- ReadAffy(cdfname = "yourcdffilename")

Important: your cdf object needs to be present in your R environment. Normally there is a package to install to do it.

ADD COMMENTlink written 14 months ago by lessismore520
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 766 users visited in the last hour