Question: DESeq and Limma+Voom Normalization for Rna-Seq Data Using Ercc Spike-In
0
komal.rathi • 3.7k wrote:
Hello everyone,
I have some RNASeq data, that has been spiked-in with exogenous ERCC controls. I have the counts from htseq-count and now I want to normalize + test for differential expression. For this, I am using two methods: DESeq and limma+voom. Below is my code for both the tests. Can someone tell me if what I am doing is correct or not? I do think that I am slightly mistaken in the voom code.
##### Using DESeq library(DESeq) # get the size factors from the spiked-ins # dat.spike.in is count data of only spike ins cds <- newCountDataSet(countData = dat.spike.in, conditions = as.character(condition)) cds <- estimateSizeFactors(cds) nf <- sizeFactors(cds) # add the size factors to data without spike-ins # dat.without.spike.in is count data where spike-ins are removed cds <- newCountDataSet(countData = dat.without.spike.in, conditions = as.character(condition)) sizeFactors(cds) <- nf # differential expression test cds <- estimateDispersions( cds, method= 'pooled', sharingMode='maximum', fitType='local') deseq.res <- nbinomTest( cds, "A", "B") ##### Using Limma + Voom library(limma) library(edgeR) groups = as.factor(condition) # conditions design <- model.matrix(~0+groups) colnames(design) = levels(groups) nf <- calcNormFactors(dat.spike.in) # calculation norm factors using spike ins only voom.norm <- voom(dat.without.spike.in, design, lib.size = colSums(dat.spike.in) * nf) # add that to voom
Awaiting your input and suggestions!
Thanks!
ADD COMMENT
• link
•
modified 5.2 years ago
by
Gordon Smyth • 2.1k
•
written
5.2 years ago by
komal.rathi • 3.7k
I can at least say that the DESeq code is correct (though you should be using DESeq2 instead). It's been long enough since I've used limma/voom that I'll wait for someone else to comment.
Thank you, I still don't know if this is the best way i.e. normalizing with ERCC spike-ins. But at least this gives me some starting point.
In general, I would only normalize against the spike-ins if absolutely needed (there are definite cases where this is needed). They're generally less robust.
I am using Limma and Voom and I think I also have similar workflow like yours.
Thank you! Do you know of any reference papers that have used the same?
Why calcNormFactors only use
dat.spike.in
?Because the ERCC spike-ins are being used for normalization. This is needed in cases like single-cell sequencing or whenever else there might be transcriptional amplification.