Question: Limma help (creating design matrix)
0
gravatar for d808bc07
7 weeks ago by
d808bc070
d808bc070 wrote:

Hi, I'm trying to use limma to get a list of differentially expressed genes between subtypes (GCB and ABC) of B-cell lymphoma. I have a dataset from an Affymetrix array with gene expression reads for 271 samples. 71 of which were classified as ABC, 144 as GCB and 56 were unclassified.

I've managed to read my CEL files and perform normalisation on the gene expression data:

norm.batch = rma(batch)
dat = exprs(norm.batch)

But I'm just not sure where to start with creating a design matrix for limma analysis of this data.

limma • 149 views
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by d808bc070
0
gravatar for ATpoint
7 weeks ago by
ATpoint46k
ATpoint46k wrote:

Hello, take a look at this example:

#/ Some dummy annotations:
samplesheet <- data.frame(Samples=paste0("Sample_", seq(1,9)), 
                          Subtype=factor(c(rep("GCB", 3), rep("ABC", 3), rep("NA", 3))))

samplesheet$Subtype # ABC is the reference level

#/ Copied from limma vignette, simulate data for three groups with n=3 each
y <- matrix(rnorm(100*9,sd=sqrt(0.05 / rchisq(100, df=10) * 10)),100,9)
rownames(y) <- paste0("Gene", seq(1,nrow(y)))
colnames(y) <- samplesheet$Samples
head(y)

# make the design by modelling subtype:
design <- model.matrix(~Subtype, data = samplesheet)

#/ standard limma workflow, see manual:
fit <- lmFit(y,design)
fit <- eBayes(fit)
topTable(fit,coef=2) # coef=2 means 2nd column of the design so GCB/ABC

Since ABC is the reference level (=the delimiter) those genes with logFC > means higher in GCB and vice versa. Does that make sense to you?

ADD COMMENTlink written 7 weeks ago by ATpoint46k
0
gravatar for d808bc07
7 weeks ago by
d808bc070
d808bc070 wrote:

Hi, thanks for the reply! I'm not sure how to relate this to my data. I have a CSV with a column for the ".CEL" file name for each sample and another column with the subtype classification, I'm trying to link the subtype classification to the sample in the design matrix. This is the code I used:

Data <- read.affybatch(dir(patt="CEL")) 
eset <- rma(Data)
pData(eset)

subtype <- clinical$subtype #clinical is the CSV containing the array file names and subtype information 
design <- model.matrix(~factor(subtype))
colnames(design) <- c("GCB","ABC",  "unclassified")

fit <- lmFit(eset, design)
fit <- eBayes(fit)
topTable(fit,coef=2)

I don't think my matrix is correct (and I'm not sure what to do with the unclassified samples, should I get rid of them and just be comparing GCB and ABC subtypes?)

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by d808bc070
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2113 users visited in the last hour
_