Question about aggregating p values from different DE analysis methods for the same RNA-seq dataset
Entering edit mode
3 months ago
mmitra ▴ 60

Hi all,

I have an RNA-seq dataset (comparing two different conditions) that I have analyzed using four different methods of differential expression (DE) analysis: DESeq2, limma-voom, liimma-trend, and edgeR. From these 4 different DE analysis, I get the set of raw p values (uncorrected for multiple hypothesis test) for all the genes. I would like to use Fisher's method for p value aggregation to combine the raw p values from these 4 different DE lists and then do the multiple hypothesis correction. Would that be OK? I am wondering because the p values are coming from different programs with different models. I would really appreciate any thoughts or comments. Thanks for the help!

fisher-test p-value RNA-seq • 366 views
Entering edit mode
3 months ago
dsull ★ 4.2k

My two cents / thoughts:

A DE program is supposed to give you a list of DE genes, testing the null hypothesis that gene is not DE. You have your list of DE genes that way. Just pick a DE program and stick with it.

There are a few problems with your approach:

1) Such an approach has not been benchmarked (most DE programs run under default configurations have been benchmarked and shown to work well -- hence they they were published). If combining results from multiple GLM regressions was good and ensured good sensitivity without losing control of the false discovery rate, those programs would already have done it.

2) The Fisher method has [implicitly] the alternative hypothesis that at least one of the programs would call a gene DE (given the info produced by all 4 programs). Is that really what you want to test? You get smaller p-values for each gene with a modest p-value -- more false positives because you're basically increasing the degrees of freedom. I'd use this alternative hypothesis for certain purposes (e.g. aggregating microarray probes for the same gene, or doing a meta-analysis across 10 different low-sample underpowered studies to, in a sense, increase sample size) but not here. In essence, these aggregation methods are used to combine incomplete parts to summarize your results. Using the exact same data AND testing the exact same prediction from the null (i.e. gene counts don't change) for the Fisher method does not really constitute "independence".

3) Related to above, it could be a form of p-value hacking. A t-test might not give me a significant p-value even though my data meets its assumptions so, before running that test, I throw in another test [also one where all the assumptions are satisfied] just to ensure my p-values are lower.

I would therefore not recommend doing this. For similar reasons, I wouldn't recommend making your "DE genes list" by looking at genes with adjusted p < 0.05 in at least one of the 4 programs. You could be more stringent and look at genes with adjusted p < 0.05 in ALL 4 programs, since that's being conservative and those genes have pretty good evidence of being bona fide DE but making a "DE genes list" with a bunch of false positives is much less desirable (I wouldn't even call it a "DE genes list" at that point).

tl;dr I'd recommend that you not do this.

Entering edit mode

Thanks a lot for the detailed explanation! This is really helpful. I was doubting this approach myself, so I thought I would ask for the opinion of the researchers on this forum. I am looking for some lowly expressed RNAs and my earlier approach was to select the genes that were assigned as significant separately by each of the 4 programs (like you suggested in your last paragraph). Then, I came across the Fisher's aggregation for meta-analysis and I was not sure whether that was the right way to do it. Thanks again for clarifying this and taking the time to go into the depth of the matter.


Login before adding your answer.

Traffic: 2528 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6