Question

deseq2 LRT vs Wald test results for DE expression among 4 conditions

2

Entering edit mode

6.6 years ago

chrysanthi.taxiarchi12 ▴ 20

Hi,

I have RNAseq data from 4 developmental conditions (let's say: 1, 2, 3, 4), 3 replicates each. I would like to find genes that are a) differentially expressed in stepwise comparisons but also b) genes that are enriched -and specific- in each stage (1,2,3,4).

For that, I've used deseq2 and after I input all the data together (to get normalisation across all samples),

a) I performed Wald test in pairwise comparisons 1vs2, 2vs3, 3vs4 and used the logFC and padj for stepwise differential expression and

b) I performed LRT test (ddsLRT <- DESeq(dds, test="LRT", reduced= ~ 1) because I don't have more variables e.g. treated vs untreated) and then I also performed Wald test for all the 6 different pairwise comparisons possible (1vs2, 1vs3, 1vs4, 2vs3, 2vs4, 3vs4). From the LRT results, 70%(!) of the genes have a padj<0.05 (I still have not understood how the logFC can be interpreted and used in this case - any input would be highly appreciated). What I find striking is that many of those genes do not show any statistically significant difference in any of the pairwise comparisons and vise versa.

If I use the individual wald tests and the intersections of them to find e.g. genes enriched genes in condition 1 by doing: logFC>1, padj<0.05 in 1vs2 and 1vs3 and 1vs4, I also get genes that do not show a significant padj in the LRT test.

I am struggling to find a way to actually apply the right statistical to find the actual differentially expressed genes that are "enriched"/show a peak in one condition. Any ideas please??

Thanks,

Chryssa

RNA-Seq deseq2 • 5.2k views

ADD COMMENT • link updated 6.6 years ago by Devon Ryan 104k • written 6.6 years ago by chrysanthi.taxiarchi12 ▴ 20

score 3 · Answer 1 · 2017-09-13

Fold-changes in your LRT are from the last column in the model matrix, which is presumably stage 4. There's no meaningful fold-change that really describes such an LRT.

There's no reason that a gene DE in the LRT has to be DE at any given time point. What you're testing is whether "including stage in the model significantly improves the fit". You can have a bunch of subthreshold changes at each stage that then are significant in the LRT (e.g., consider a small increase in expression at each stage).

Don't intersect test results, there are metrics designed specifically to assess how enriched one group is versus multiple others. One such example would be the tau metric (see this paper for a comparison).