Question

too many adj pvalues = 1

0

Entering edit mode

7.0 years ago

Illinu ▴ 110

Hi, While analyzing a set of DEGs resulting from DESeq2, I noticed for some genes that were I 'see' differences they are not significant, but then another gene with similar behaviour, the difference is significant. They both come from different comparisons, the former with 6 replicates and the later with 3, so I guess the number of replicates in the comparison make a difference. However I decided to look at the p-value distribution and noticed that almost all pvalues fall in the 1 bin. From what I read this could mean that the differential test would be assuming that the data has a distribution it doesn't have. But then I am all confused whether I should consider the p-values, the adj p-values or do another test alltogether and ignore DESeq2.

These are the genes: gene A, DE when testing for genotype 1 vs genotype 2 (6 replicates) adj-pvalue=0.006, but it is not DE in the genotype 2 T vs C comparison (adj p-value = 0.8, red and blue dots), while it has the same profile/behaviour as gene B -> with DE in genotype 2 T vs C (adj p-value=0.016) but not DE when testing for genotype 1 vs genotype 2 (adj p-value=0.77). If I have to interpret these two genes as a biologist, I would not say they are differentially expressed between genotypes but both induced at treatment in genotype 2. I am wondering how I can support this in a report. while justifying the statistical results. [1] http://hpics.li/66db722

This is the pvalues histogram for one of the comparisons but the other one looks the same [2] http://hpics.li/5d0bdf2

RNA-Seq DESeq2 p-value • 2.8k views

ADD COMMENT • link updated 7.0 years ago by Devon Ryan 104k • written 7.0 years ago by Illinu ▴ 110

score 1 · Answer 1 · 2017-05-07

1

Entering edit mode

7.0 years ago

Devon Ryan 104k

The only thing that matters is the adjusted p-value, ignore the unadjusted p-values.

Look at the error bars on gene B. That is why the difference isn't significant. The statistical results are correct, you have no basis upon which to disagree with them.

ADD COMMENT • link 7.0 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon, It just seems weird that the test finds insignificant Gene A between T and C for genotype B, when it goes from 2000 to 8000 counts with small error bars. To me there is a clear induction of this gene by the treatment compared to genotype 1. But the test considers this difference as not significant. So I should interpret this as the gene not being induced at T in genot 1, I guess? I also wanted to know if the pvalues histogram with so many pvalues = 1 is pointing that something is wrong with the data. Thanks

ADD REPLY • link 7.0 years ago by Illinu ▴ 110

0

Entering edit mode

You might have an outlier sample, which is inflating the variance and decreasing your power. In the DESeq2 tutorial there are some examples of creating dendrograms and PCA plots. Have a look at those and see if one of the samples is obviously weird (in which case you can exclude it). DESeq2 will normally try to do that automatically, but you need at least 6 replicates per group for it to work.

ADD REPLY • link 6.9 years ago by Devon Ryan 104k