Question

Question about aggregating p values from different DE analysis methods for the same RNA-seq dataset

2

Entering edit mode

13 months ago

mmitra ▴ 60

Hi all,

I have an RNA-seq dataset (comparing two different conditions) that I have analyzed using four different methods of differential expression (DE) analysis: DESeq2, limma-voom, liimma-trend, and edgeR. From these 4 different DE analysis, I get the set of raw p values (uncorrected for multiple hypothesis test) for all the genes. I would like to use Fisher's method for p value aggregation to combine the raw p values from these 4 different DE lists and then do the multiple hypothesis correction. Would that be OK? I am wondering because the p values are coming from different programs with different models. I would really appreciate any thoughts or comments. Thanks for the help!

fisher-test p-value RNA-seq • 765 views

ADD COMMENT • link 13 months ago by mmitra ▴ 60

score 6 · Accepted Answer · 2023-03-07

My two cents / thoughts:

A DE program is supposed to give you a list of DE genes, testing the null hypothesis that gene is not DE. You have your list of DE genes that way. Just pick a DE program and stick with it.

There are a few problems with your approach:

1) Such an approach has not been benchmarked (most DE programs run under default configurations have been benchmarked and shown to work well -- hence they they were published). If combining results from multiple GLM regressions was good and ensured good sensitivity without losing control of the false discovery rate, those programs would already have done it.

2) The Fisher method has [implicitly] the alternative hypothesis that at least one of the programs would call a gene DE (given the info produced by all 4 programs). Is that really what you want to test? You get smaller p-values for each gene with a modest p-value -- more false positives because you're basically increasing the degrees of freedom. I'd use this alternative hypothesis for certain purposes (e.g. aggregating microarray probes for the same gene, or doing a meta-analysis across 10 different low-sample underpowered studies to, in a sense, increase sample size) but not here. In essence, these aggregation methods are used to combine incomplete parts to summarize your results. Using the exact same data AND testing the exact same prediction from the null (i.e. gene counts don't change) for the Fisher method does not really constitute "independence".

3) Related to above, it could be a form of p-value hacking. A t-test might not give me a significant p-value even though my data meets its assumptions so, before running that test, I throw in another test [also one where all the assumptions are satisfied] just to ensure my p-values are lower.

I would therefore not recommend doing this. For similar reasons, I wouldn't recommend making your "DE genes list" by looking at genes with adjusted p < 0.05 in at least one of the 4 programs. You could be more stringent and look at genes with adjusted p < 0.05 in ALL 4 programs, since that's being conservative and those genes have pretty good evidence of being bona fide DE but making a "DE genes list" with a bunch of false positives is much less desirable (I wouldn't even call it a "DE genes list" at that point).

tl;dr I'd recommend that you not do this.