Question

DESeq2: Multiples groups & Cook's distance cutoff

1

Entering edit mode

4.2 years ago

benjamin.saintpierre ▴ 20

Hello everyone,

I've been looking in many topics, but none answer clearly my question. In a classical multiple groups RNA-seq analyse, does the flagging with the Cook's distance take in account the groups you're looking at?

Here's my experiement: 4 different groups (3 replicates, no batch effect, etc...) [DESeq2 pipeline] res(dds, contrast=c('condition','group1','group2') ) I saw that a gene of interest was getting NA as pvalue and padj. To understand why, I decided to investigate on normalized count, raw count and Cook's distance. On the last metrics, one of the Group 3 samples is definitely considered as an outlier. So is that why I can't get pvalue for this gene, even if I'm currently working on the group 1 and 2, not the 3 ?

Has anyone a clue to avoid this effect ?

Thanks!

RNA-Seq Deseq2 • 1.4k views

ADD COMMENT • link updated 4.2 years ago by mike.deberardine ▴ 110 • written 4.2 years ago by benjamin.saintpierre ▴ 20

score 1 · Answer 1 · 2020-02-26

From my understanding[*], DESeq2 manages the "multiple testing problem" by performing an initial filtering of the genelist using (what should be) a distinct statistical test, i.e. the genelist is "independently filtered". The idea is that genes that are unlikely to produce a low p-value are removed from the analysis beforehand. Genes that have been independently filtered are given a p-value of NA.

You can turn off independent filtering in the call to the results() function, or increase the alpha threshold.

I naively expect you should lose power by doing either of those things.

[*] My understanding of DESeq2 is, in its entirety: independent filtering and negative binomial Wald test with sample/condition-blind estimates of genewise dispersion