Question

Limma-Voom which of two models (and p-values) is correct?

0

Entering edit mode

7.6 years ago

javiergutierreza ▴ 40

Dear Biostars, I'm analysing RNA-seq data with Limma-Voom and after reading and apply different methods of analysis I have arrived to two different results and I'm not sure which one is correct. Here the specs: n=60 (~30 per tissue) The experiment (Multi-level time course experiment):

dataForAnalysis = data.frame(Sample=c("s_1", "s_1", "s_2", "s_2", "s_1", "s_1", "s_2", "s_2"), Tissue=c("A", "A","A", "A", "B", "B", "B", "B"), Condition=c("before", "after","before", "after","before", "after","before", "after"))

Sample Tissue Condition(Sampling time)

s_1 A before

s_1 A after

s_2 A before

s_2 A after

s_1 B before

s_1 B after

s_2 B before

s_2 B after

Analysis No. 1

designMatrix = model.matrix(~ Tissue + Condition, dataForAnalysis)

nf = calcNormFactors(counts)

forDgeData.voom <- voom(counts, designMatrix, plot=False,  lib.size=colSums(counts)*nf)

forDgeData.dups = duplicateCorrelation(forDgeData.voom, designMatrix, block= dataForAnalysis$Sample)

forDgeData.fitRobust = lmFit(forDgeData.voom,  forDgeData.voom$design, correlation= forDgeData.dups$consensus, block= dataForAnalysis$Sample, method='robust')

 forDgeData.ebayes = eBayes(forDgeData.fitRobust, robust=True)

Using this method, these are my results:

summary(decideTests(forDgeData.ebayes))

(Intercept) TissueA ConditionAfter

-1 1615 6594 22

0 446 1573 15631

1 13631 7525 39

From these results, I'm taking the ones in the "ConditionAfter" to get the DE genes between the conditions using:

toptable(forDgeData.ebayes, coef="ConditionAfter", ...)

Analysis No. 2 (For this analysis I pasted tissue and condition for the design matrix and use contrasts)

predesignMatrix = factor(paste(dataForAnalysis$Tissue, dataForAnalysis$Condition, sep='.'))

designMatrix = model.matrix(~ 0 + predesignMatrix)

nf = calcNormFactors(counts)

forDgeData.voom <- voom(counts, designMatrix, plot=False,  lib.size=colSums(counts)*nf)

forDgeData.dups = duplicateCorrelation(forDgeData.voom, designMatrix, block= dataForAnalysis$Sample)

forDgeData.fitRobust = lmFit(forDgeData.voom,  forDgeData.voom$design, correlation= forDgeData.dups$consensus, block= dataForAnalysis$Sample, method='robust')

 forDgeData.ebayes = eBayes(forDgeData.fitRobust, robust=True)


designMatrix.forContrast = makeContrasts(diff_Before_vs_After = (A.after-A.before)-(B.after-B.before), levels= designMatrix)

forDgeData.fitRobust.fitRobust2 = contrasts.fit(forDgeData.fitRobust, designMatrix.forContrast)

forDgeData.ebayes = eBayes(forDgeData.fitRobust.fitRobust2)

Using this method, these are my results:

summary(decideTests(forDgeData.ebayes))

diff_Before_vs_After

-1 7

0 15680

1 5

So, comparing results from Analysis No. 1 and No. 2, only 3 genes overlap. My main question here is if there are DE genes between the conditions (before and after), taking into account the tissue factor and also that the samples are duplicated.

Any help or advice... more than welcome!

Thanks

RNA-Seq rna-seq • 2.3k views

ADD COMMENT • link updated 7.6 years ago by Devon Ryan 104k • written 7.6 years ago by javiergutierreza ▴ 40

score 0 · Answer 1 · 2016-10-03

0

Entering edit mode

7.6 years ago

Devon Ryan 104k

In the first model, you're asking for DE genes due to Condition. In the second model, you're asking for DE genes due to the interaction between Condition and Tissue. These are completely different questions, so it's unsurprising that there's a difference.

ADD COMMENT • link 7.6 years ago by Devon Ryan 104k