Limma help (creating design matrix)
2
0
Entering edit mode
3.3 years ago
d808bc07 • 0

Hi, I'm trying to use limma to get a list of differentially expressed genes between subtypes (GCB and ABC) of B-cell lymphoma. I have a dataset from an Affymetrix array with gene expression reads for 271 samples. 71 of which were classified as ABC, 144 as GCB and 56 were unclassified.

I've managed to read my CEL files and perform normalisation on the gene expression data:

norm.batch = rma(batch)
dat = exprs(norm.batch)

But I'm just not sure where to start with creating a design matrix for limma analysis of this data.

limma • 844 views
ADD COMMENT
0
Entering edit mode
3.3 years ago
ATpoint 82k

Hello, take a look at this example:

#/ Some dummy annotations:
samplesheet <- data.frame(Samples=paste0("Sample_", seq(1,9)), 
                          Subtype=factor(c(rep("GCB", 3), rep("ABC", 3), rep("NA", 3))))

samplesheet$Subtype # ABC is the reference level

#/ Copied from limma vignette, simulate data for three groups with n=3 each
y <- matrix(rnorm(100*9,sd=sqrt(0.05 / rchisq(100, df=10) * 10)),100,9)
rownames(y) <- paste0("Gene", seq(1,nrow(y)))
colnames(y) <- samplesheet$Samples
head(y)

# make the design by modelling subtype:
design <- model.matrix(~Subtype, data = samplesheet)

#/ standard limma workflow, see manual:
fit <- lmFit(y,design)
fit <- eBayes(fit)
topTable(fit,coef=2) # coef=2 means 2nd column of the design so GCB/ABC

Since ABC is the reference level (=the delimiter) those genes with logFC > means higher in GCB and vice versa. Does that make sense to you?

ADD COMMENT
0
Entering edit mode
3.3 years ago
d808bc07 • 0

Hi, thanks for the reply! I'm not sure how to relate this to my data. I have a CSV with a column for the ".CEL" file name for each sample and another column with the subtype classification, I'm trying to link the subtype classification to the sample in the design matrix. This is the code I used:

Data <- read.affybatch(dir(patt="CEL")) 
eset <- rma(Data)
pData(eset)

subtype <- clinical$subtype #clinical is the CSV containing the array file names and subtype information 
design <- model.matrix(~factor(subtype))
colnames(design) <- c("GCB","ABC",  "unclassified")

fit <- lmFit(eset, design)
fit <- eBayes(fit)
topTable(fit,coef=2)

I don't think my matrix is correct (and I'm not sure what to do with the unclassified samples, should I get rid of them and just be comparing GCB and ABC subtypes?)

ADD COMMENT

Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6