Is variation between samples reason for DESeq2 not calling DEGs significant?
1
0
Entering edit mode
3.8 years ago
molbioTH • 0

I recently ran transcriptome analysis using DESeq2 with four sample groups, three reps each. One sample was a negative control (N), one was a positive control (P), two were experimental groups (E1, E2).

When comparing N and P, there were thousands of significant DEGs (padj < 0.05). But there were zero significant DEGs with E1 and a only small number with E2 when compared back to N.

When I look at the FPKM values, I am surprised at how some DEGs with E2 vs N were not called statistically significant, and instead had padj of 0.99. E.g. one gene had FPKMs of 2300, 650, and 430 in E2. The same gene had FPKMs of 600, 150, and 140 in N. Therefore, in every rep E2 had approximately 3x as much expression as N.

There are many other genes which follow this same pattern; in each replicate, E2 has at least twice as high of expression as N. However, the expression is variable between replicates.

Is this variation between samples the cause of these DEGs not being called statistically significant? And is there any way to address this?

RNA-Seq deseq2 • 724 views
ADD COMMENT
0
Entering edit mode

You know that DESeq doesn't look at fpkm. Look at the normalized counts instead.

ADD REPLY
1
Entering edit mode
3.8 years ago

You may want to re-check your pipeline because DESeq2 does not usually produce FPKM values.

In any case, to answer your question, yes, a large amount of variation between samples for a given gene will affect the p-values.

Based on the information that you've provided, I imagine that your E1 and E2 groups do not perform tight clusters on a PCA bi-plot of PC1 versus PC2. So, the underlying problem may be biological.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6