Question

Limma help (creating design matrix)

0

Entering edit mode

3.3 years ago

d808bc07 • 0

Hi, I'm trying to use limma to get a list of differentially expressed genes between subtypes (GCB and ABC) of B-cell lymphoma. I have a dataset from an Affymetrix array with gene expression reads for 271 samples. 71 of which were classified as ABC, 144 as GCB and 56 were unclassified.

I've managed to read my CEL files and perform normalisation on the gene expression data:

norm.batch = rma(batch)
dat = exprs(norm.batch)

But I'm just not sure where to start with creating a design matrix for limma analysis of this data.

limma • 844 views

ADD COMMENT • link 3.3 years ago by d808bc07 • 0

score 0 · Answer 1 · 2021-01-15

Hello, take a look at this example:

#/ Some dummy annotations:
samplesheet <- data.frame(Samples=paste0("Sample_", seq(1,9)), 
                          Subtype=factor(c(rep("GCB", 3), rep("ABC", 3), rep("NA", 3))))

samplesheet$Subtype # ABC is the reference level

#/ Copied from limma vignette, simulate data for three groups with n=3 each
y <- matrix(rnorm(100*9,sd=sqrt(0.05 / rchisq(100, df=10) * 10)),100,9)
rownames(y) <- paste0("Gene", seq(1,nrow(y)))
colnames(y) <- samplesheet$Samples
head(y)

# make the design by modelling subtype:
design <- model.matrix(~Subtype, data = samplesheet)

#/ standard limma workflow, see manual:
fit <- lmFit(y,design)
fit <- eBayes(fit)
topTable(fit,coef=2) # coef=2 means 2nd column of the design so GCB/ABC

Since ABC is the reference level (=the delimiter) those genes with logFC > means higher in GCB and vice versa. Does that make sense to you?

score 0 · Answer 2 · 2021-01-15

Hi, thanks for the reply! I'm not sure how to relate this to my data. I have a CSV with a column for the ".CEL" file name for each sample and another column with the subtype classification, I'm trying to link the subtype classification to the sample in the design matrix. This is the code I used:

Data <- read.affybatch(dir(patt="CEL")) 
eset <- rma(Data)
pData(eset)

subtype <- clinical$subtype #clinical is the CSV containing the array file names and subtype information 
design <- model.matrix(~factor(subtype))
colnames(design) <- c("GCB","ABC",  "unclassified")

fit <- lmFit(eset, design)
fit <- eBayes(fit)
topTable(fit,coef=2)

I don't think my matrix is correct (and I'm not sure what to do with the unclassified samples, should I get rid of them and just be comparing GCB and ABC subtypes?)