Question

Is variation between samples reason for DESeq2 not calling DEGs significant?

0

Entering edit mode

3.8 years ago

molbioTH • 0

I recently ran transcriptome analysis using DESeq2 with four sample groups, three reps each. One sample was a negative control (N), one was a positive control (P), two were experimental groups (E1, E2).

When comparing N and P, there were thousands of significant DEGs (padj < 0.05). But there were zero significant DEGs with E1 and a only small number with E2 when compared back to N.

When I look at the FPKM values, I am surprised at how some DEGs with E2 vs N were not called statistically significant, and instead had padj of 0.99. E.g. one gene had FPKMs of 2300, 650, and 430 in E2. The same gene had FPKMs of 600, 150, and 140 in N. Therefore, in every rep E2 had approximately 3x as much expression as N.

There are many other genes which follow this same pattern; in each replicate, E2 has at least twice as high of expression as N. However, the expression is variable between replicates.

Is this variation between samples the cause of these DEGs not being called statistically significant? And is there any way to address this?

RNA-Seq deseq2 • 724 views

ADD COMMENT • link updated 3.8 years ago by Kevin Blighe 87k • written 3.8 years ago by molbioTH • 0

0

Entering edit mode

You know that DESeq doesn't look at fpkm. Look at the normalized counts instead.

ADD REPLY • link 3.8 years ago by swbarnes2 14k

score 1 · Answer 1 · 2020-06-29

You may want to re-check your pipeline because DESeq2 does not usually produce FPKM values.

In any case, to answer your question, yes, a large amount of variation between samples for a given gene will affect the p-values.

Based on the information that you've provided, I imagine that your E1 and E2 groups do not perform tight clusters on a PCA bi-plot of PC1 versus PC2. So, the underlying problem may be biological.

Kevin