2.6 years ago

Leo
Hello, I am performing an analysis of differentially expressed genes and I am having difficulties interpreting the result of this p-value histogram. I do not know much about statistics and I am still learning. I am working with a dataset only (total of 53 samples, conditions being 8 with MYCN gene amplification X 45 without NMYC gene amplification). I don't know if the distribution of the values is good or not to procede. Someone can give me a little help?

Thanks in advance!

That histogram looks really strange. A p-value histogram where you fail to reject the null hypothesis would be somewhat uniform. On the other hand if there is an appreciable difference between the conditions you would see a peak near zero and a fairly uniform tail going to 1.

You may be running into a problem either due to the balance in sample size between the conditions, sample quality, or some uncontrolled batch effect. Regardless I would go back and further QC the samples leading up to this point, such as making PCA plots.

I think rpolicastro has already provided good points. In addition, are you filtering the low count genes before conducting differential steps. Also what tool are you using for differential testing?