Question: too many adj pvalues = 1
gravatar for Illinu
3.2 years ago by
Illinu90 wrote:

Hi, While analyzing a set of DEGs resulting from DESeq2, I noticed for some genes that were I 'see' differences they are not significant, but then another gene with similar behaviour, the difference is significant. They both come from different comparisons, the former with 6 replicates and the later with 3, so I guess the number of replicates in the comparison make a difference. However I decided to look at the p-value distribution and noticed that almost all pvalues fall in the 1 bin. From what I read this could mean that the differential test would be assuming that the data has a distribution it doesn't have. But then I am all confused whether I should consider the p-values, the adj p-values or do another test alltogether and ignore DESeq2.

These are the genes: gene A, DE when testing for genotype 1 vs genotype 2 (6 replicates) adj-pvalue=0.006, but it is not DE in the genotype 2 T vs C comparison (adj p-value = 0.8, red and blue dots), while it has the same profile/behaviour as gene B -> with DE in genotype 2 T vs C (adj p-value=0.016) but not DE when testing for genotype 1 vs genotype 2 (adj p-value=0.77). If I have to interpret these two genes as a biologist, I would not say they are differentially expressed between genotypes but both induced at treatment in genotype 2. I am wondering how I can support this in a report. while justifying the statistical results. [1]

This is the pvalues histogram for one of the comparisons but the other one looks the same [2]

rna-seq deseq2 p-value • 1.4k views
ADD COMMENTlink modified 3.2 years ago by Devon Ryan95k • written 3.2 years ago by Illinu90
gravatar for Devon Ryan
3.2 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

The only thing that matters is the adjusted p-value, ignore the unadjusted p-values.

Look at the error bars on gene B. That is why the difference isn't significant. The statistical results are correct, you have no basis upon which to disagree with them.

ADD COMMENTlink written 3.2 years ago by Devon Ryan95k

Hi Devon, It just seems weird that the test finds insignificant Gene A between T and C for genotype B, when it goes from 2000 to 8000 counts with small error bars. To me there is a clear induction of this gene by the treatment compared to genotype 1. But the test considers this difference as not significant. So I should interpret this as the gene not being induced at T in genot 1, I guess? I also wanted to know if the pvalues histogram with so many pvalues = 1 is pointing that something is wrong with the data. Thanks

ADD REPLYlink written 3.2 years ago by Illinu90

You might have an outlier sample, which is inflating the variance and decreasing your power. In the DESeq2 tutorial there are some examples of creating dendrograms and PCA plots. Have a look at those and see if one of the samples is obviously weird (in which case you can exclude it). DESeq2 will normally try to do that automatically, but you need at least 6 replicates per group for it to work.

ADD REPLYlink written 3.2 years ago by Devon Ryan95k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour