Hi, While analyzing a set of DEGs resulting from DESeq2, I noticed for some genes that were I 'see' differences they are not significant, but then another gene with similar behaviour, the difference is significant. They both come from different comparisons, the former with 6 replicates and the later with 3, so I guess the number of replicates in the comparison make a difference. However I decided to look at the p-value distribution and noticed that almost all pvalues fall in the 1 bin. From what I read this could mean that the differential test would be assuming that the data has a distribution it doesn't have. But then I am all confused whether I should consider the p-values, the adj p-values or do another test alltogether and ignore DESeq2.
These are the genes: gene A, DE when testing for genotype 1 vs genotype 2 (6 replicates) adj-pvalue=0.006, but it is not DE in the genotype 2 T vs C comparison (adj p-value = 0.8, red and blue dots), while it has the same profile/behaviour as gene B -> with DE in genotype 2 T vs C (adj p-value=0.016) but not DE when testing for genotype 1 vs genotype 2 (adj p-value=0.77). If I have to interpret these two genes as a biologist, I would not say they are differentially expressed between genotypes but both induced at treatment in genotype 2. I am wondering how I can support this in a report. while justifying the statistical results. [1] http://hpics.li/66db722
This is the pvalues histogram for one of the comparisons but the other one looks the same [2] http://hpics.li/5d0bdf2
Hi Devon, It just seems weird that the test finds insignificant Gene A between T and C for genotype B, when it goes from 2000 to 8000 counts with small error bars. To me there is a clear induction of this gene by the treatment compared to genotype 1. But the test considers this difference as not significant. So I should interpret this as the gene not being induced at T in genot 1, I guess? I also wanted to know if the pvalues histogram with so many pvalues = 1 is pointing that something is wrong with the data. Thanks
You might have an outlier sample, which is inflating the variance and decreasing your power. In the DESeq2 tutorial there are some examples of creating dendrograms and PCA plots. Have a look at those and see if one of the samples is obviously weird (in which case you can exclude it). DESeq2 will normally try to do that automatically, but you need at least 6 replicates per group for it to work.