DESeq and Limma+Voom Normalization for Rna-Seq Data Using Ercc Spike-In
1
0
Entering edit mode
6.2 years ago
komal.rathi ★ 3.9k

Hello everyone,

I have some RNASeq data, that has been spiked-in with exogenous ERCC controls. I have the counts from htseq-count and now I want to normalize + test for differential expression. For this, I am using two methods: DESeq and limma+voom. Below is my code for both the tests. Can someone tell me if what I am doing is correct or not? I do think that I am slightly mistaken in the voom code.

##### Using DESeq
library(DESeq)
# get the size factors from the spiked-ins
# dat.spike.in is count data of only spike ins
cds <- newCountDataSet(countData = dat.spike.in, conditions = as.character(condition))
cds <- estimateSizeFactors(cds)
nf <- sizeFactors(cds)

# add the size factors to data without spike-ins
# dat.without.spike.in is count data where spike-ins are removed
cds <- newCountDataSet(countData = dat.without.spike.in, conditions = as.character(condition))
sizeFactors(cds) <- nf

# differential expression test
cds <- estimateDispersions( cds, method= 'pooled', sharingMode='maximum', fitType='local')
deseq.res <- nbinomTest( cds, "A", "B")

##### Using Limma + Voom
library(limma)
library(edgeR)
groups = as.factor(condition) # conditions
design <- model.matrix(~0+groups)
colnames(design) = levels(groups)
nf <- calcNormFactors(dat.spike.in) # calculation norm factors using spike ins only
voom.norm <- voom(dat.without.spike.in, design, lib.size = colSums(dat.spike.in) * nf) # add that to voom

Thanks!

RNA-Seq spike-in deseq limma voom • 4.8k views
1
Entering edit mode

I can at least say that the DESeq code is correct (though you should be using DESeq2 instead). It's been long enough since I've used limma/voom that I'll wait for someone else to comment.

0
Entering edit mode

Thank you, I still don't know if this is the best way i.e. normalizing with ERCC spike-ins. But at least this gives me some starting point.

0
Entering edit mode

In general, I would only normalize against the spike-ins if absolutely needed (there are definite cases where this is needed). They're generally less robust.

1
Entering edit mode

I am using Limma and Voom and I think I also have similar workflow like yours.

0
Entering edit mode

Thank you! Do you know of any reference papers that have used the same?

0
Entering edit mode

Why calcNormFactors only use dat.spike.in?

0
Entering edit mode

Because the ERCC spike-ins are being used for normalization. This is needed in cases like single-cell sequencing or whenever else there might be transcriptional amplification.

4
Entering edit mode
6.2 years ago
Gordon Smyth ★ 3.9k