Help interpreting p value distribution
3 months ago

Hello everyone!

I am using Transcriptome Analysis Console (TAC) for DE testing of gene expression data and R for downstream analyses and visualization. I generated volcano plots for three separate datasets (let's call them A, B, and C), and the distribution of p values does not look uniform (there are some gaps as can be seen in the images below). Even in dataset B whose P value distribution looks good, there are no genes with -log10p around 4.

I have never seen this pattern before and I am not sure what to make out of it. What do you think this indicates? Perhaps poor quality of the data?

3 months ago

That seems like some sort of systematic error of the processing itself. Perhaps filtering went wrong, or some sort of normalization process kicks in.

The main point is that it is systematic and not a natural error process.

It is hard to see how merely poor quality, noisy data would produce missing p-values in a narrow band alone.


