Question: Is variation between samples reason for DESeq2 not calling DEGs significant?
13 days ago by
molbioTH0 wrote:

I recently ran transcriptome analysis using DESeq2 with four sample groups, three reps each. One sample was a negative control (N), one was a positive control (P), two were experimental groups (E1, E2).

When comparing N and P, there were thousands of significant DEGs (padj < 0.05). But there were zero significant DEGs with E1 and a only small number with E2 when compared back to N.

When I look at the FPKM values, I am surprised at how some DEGs with E2 vs N were not called statistically significant, and instead had padj of 0.99. E.g. one gene had FPKMs of 2300, 650, and 430 in E2. The same gene had FPKMs of 600, 150, and 140 in N. Therefore, in every rep E2 had approximately 3x as much expression as N.

There are many other genes which follow this same pattern; in each replicate, E2 has at least twice as high of expression as N. However, the expression is variable between replicates.

Is this variation between samples the cause of these DEGs not being called statistically significant? And is there any way to address this?

written 13 days ago by molbioTH0

You know that DESeq doesn't look at fpkm. Look at the normalized counts instead.

written 13 days ago by swbarnes27.9k
13 days ago by
Kevin Blighe61k
University College London
Kevin Blighe61k wrote:

You may want to re-check your pipeline because DESeq2 does not usually produce FPKM values.

In any case, to answer your question, yes, a large amount of variation between samples for a given gene will affect the p-values.

Based on the information that you've provided, I imagine that your E1 and E2 groups do not perform tight clusters on a PCA bi-plot of PC1 versus PC2. So, the underlying problem may be biological.


written 13 days ago by Kevin Blighe61k
