Question: What is the best way to analyze non-normalized Illumina HumanHT-12 microarray?
0
gravatar for Leite
14 months ago by
Leite410
Leite410 wrote:

Hello everyone,

I tried to help on a post one month ago, but without success.

I continue to have problems analyzing the HumanHT-12 chip, I'm trying to follow this code by @Gordon Smyth.

My biggest problem is to split the groups, how to tell the R what Healthy controls and the patients.

> library(limma)
> x <- read.ilmn("GSE74629_non-normalized.txt",expr="SAMPLE ",probeid="ID_REF")
Reading file GSE74629_non-normalized.txt ... ...
> y <- neqc(x)
Note: inferring mean and variance of negative control probe intensities from the
detection p-values.
> Group <- rep(c("PDAC","Healthy"),c(36,14))
> Group <- factor(Group)
> design <- model.matrix(~Group)
> keep <- rowSums(y$E>5) >= 14
> y2 <- y[keep,]
> fit <- lmFit(y2,design)
> fit <- eBayes(fit,trend=TRUE,robust=TRUE)
> topTable(fit,coef=2)

Can I use something like targets <- readTargets ("targets.txt") ? and then say what are the controls and patients:

#Build the design matrix for the linear modelling function
f <- factor(targets$Target, levels = unique(targets$Target))
design <- model.matrix(~0 + f)
colnames(design) <- levels(f)

#Create a contrast matrix
contrast.matrix <- makeContrasts("patients-control", levels=design)

Thank you all,

Leite

ADD COMMENTlink written 14 months ago by Leite410
1

Can I use something like targets <- readTargets ("targets.txt") ? and then say what are the controls and patients

Yes you can. Take note that topTable() is deprecated in favour of topTreat() - the post you are following is quite old, I would suggest you follow the limma Users Guide instead.

ADD REPLYlink written 14 months ago by h.mon26k

Hey h.mon, ty so much. I'll try to do this. the difference topTreat() and topTable() is that in topTreat I can use more parameters to filter, or have more things?

ADD REPLYlink written 14 months ago by Leite410
1

The difference is topTreat() better controls false discovery rate, read the Note section on ?topTreat.

ADD REPLYlink written 14 months ago by h.mon26k

Thank you so much again.

ADD REPLYlink written 14 months ago by Leite410
1

Just to add: The example that Gordon posted (on Bioconductor) produces the same type of model matrix that we normally create via information supplied via readTargets. One can create the model matrix in any way, though. You do not necessarily have to use readTargets.

For limma, for any array source, you just need to get your sample files into a data matrix, with samples as columns and genes as rows. Then, you just need to specify a model matrix (via any means) that defines your sample groupings.

Be aware, also, that Gordon may have only be supplying sample code and that the line Group <- rep(c("PDAC","Healthy"),c(36,14)), i.e., 36 PAC versus 14 controls, may not reflect the actual numbers of these groups in the dataset

ADD REPLYlink modified 14 months ago • written 14 months ago by Kevin Blighe46k

Hey Kevin,

Thank you so much for your reply. I decided to use the normalized data, it seemed to be the best and easiest way to analyze this data. I made a breakthrough this weekend and my R code is like this:

#Configure o diretório de trabalho
setwd("C:/Users/")

#Read the targets
library(limma)
library(affy)

eset <- readExpressionSet("GEOdata_normalised.txt", header=TRUE)
read.delim("Targets.txt", header=TRUE)

#Build the design matrix for the linear modelling function
f <- factor(targets$Target, levels = unique(targets$Target))
design <- model.matrix(~0 + f)
colnames(design) <- levels(f)

#Apply the intensity values to lmFit
fit <- lmFit(eset, design)
write.table(fit, file="fit.txt", sep="\t", quote=FALSE)

#Create a contrast matrix
contrast.matrix <- makeContrasts("patients-control", levels=design)

#Apply this contrast matrix to the modeled data and compute statistics for the data
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

#Output the statistics for the dataset and write them to disk
output <- topTreat(fit2, coef=1, number=Inf, adjust.method="BH", lfc=0.42)
write.table(output, file="alive-Control.txt", sep="\t", quote=FALSE)

I'm studying to see if there's any mistake in it, what do you guys think?

Best,

Leite

ADD REPLYlink modified 14 months ago • written 14 months ago by Leite410
1

Boa tade Leite, and how do the p-values appear? A good measure is to take the most differentially expressed genes and see if they separate your groups in clstering + heatmap.

ADD REPLYlink written 14 months ago by Kevin Blighe46k

Boa tarde Kevin,

I can find many genes with an adjusted p <0.05. I'll try what you mentioned. What does not make me very comfortable with this code is this function eset <- readExpressionSet I'm still studying more about it.

ADD REPLYlink written 14 months ago by Leite410
1

That function appears to just 'coerce' (convert) a data matrix into an Expression Set object, which is require for Limma.

The only other thing that you need to ensure is that your column names (in eset) are in the same order as your samples in targets.

ADD REPLYlink written 14 months ago by Kevin Blighe46k
1

Oi Kevin,

This point I had not thought of. "The only other thing that you need to ensure is that your column names (in eset) are in the same order as your samples in targets."

Thank you very much for this super tip, it made all the difference now.

ADD REPLYlink written 14 months ago by Leite410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1523 users visited in the last hour