Question

Pvalues have very similar near significant values

0

Entering edit mode

22 months ago

Edward Chen • 0

Hi,

To give some context, I've been trying to compare global vs absolute normalization techniques with my dataset. Thus, for absolute scaling, I've been trying to normalize to the ERCC spike-in within the sample using a DESeq2 method and an edgeR limma+voom method.

The DESeq2 method consists of using RUVSeq to first remove unwanted technical variation (without doing betweenLaneNormalization) and then DESeq is used with the controlGene parameter specified as the ERCC spike-in. Hopefully, this is what I should be doing. The result returned is what I expected.

However, the edgeR limma+voom padj values seems strange. My normalization technique followed the steps below:

> N <- colSums(genes_expr)

> nf <- calcNormFactors(ercc_expr, lib.size=N)

> voom.norm <- voom(genes_expr, design, lib.size = N * nf, plot=T)

The lowest padj values look as such:

topTable 10

where the lowest padj are all identical and "nearly" significant. The pvalue histogram is as such:

pvalue graph

I've never seen pvalues act this way so I'm not sure that it's normal. Is there a reason to explain why these values are all somewhat close to 0, as well as they are nearly all the same value. The most significant transcripts from edgeR limma+voom do correspond with those that came up using DESeq2, so I believe that it might just be due to a difference in techniques.

edgeR voom RNASeq DESeq2 limma • 635 views

ADD COMMENT • link updated 22 months ago by cpad0112 21k • written 22 months ago by Edward Chen • 0

1

Entering edit mode

I think that is commonly observed in calculating p-adj (BH) values. For more details, visit this link: https://stats.stackexchange.com/questions/476658/r-why-are-my-fgsea-adjusted-p-values-all-the-same

*Please do not post images of the data

ADD REPLY • link 22 months ago by cpad0112 21k

score 1 · Answer 1 · 2022-06-21

In situations like this one needs to be able to explain to themselves what the "expected" distributions should look like, then evaluate the observed p-values accordingly.

First and foremost the normalization simply operates on existing p-values, if the normalized p-values look odd then the unnormalized ones are probably even odder and the normalization magnifies an existing problem.

If the effect that you attempted to compute a p-value of is not present then the p-values ought to be all uniform from 0 to 1 representing the overall variability in the process.

If the effect that you compute the p-value of is present then there should be a big increase around 0, showing you that the frequency of the "valid" observation exceeds the background preferably many times over.

The adjusted p-values you have there are likely FDR values, and those have a different interpretation than traditional p-values even though are called the same. I would not get too hung up over that, various normalization factors may be applied that, in turn could wipe out smaller level variation.