Question: Intersection of multiple methods for RNA-Seq differential expression: conservative or crazy?
gravatar for Adamc
22 months ago by
United States
Adamc620 wrote:


For a while now, I've been looking at the intersect of the significant results from DESeq2 and EdgeR as my standard for determining differentially expressed genes, treating EdgeR as a filter over the DESeq2 results. I usually report fold-changes from DESeq2 as the foldchange shrinkage when counts are low or highly variable is nice for not putting too much weight on results that are significant but low-confidence. I made the assumption that my considering of the p-values of both methods would result in a lower false-positive rate without sacrificing too much in terms of false negatives.

However, I was recently challenged on this assumption by a statistician, and must now consider the possibility that this approach increases the rate of false negatives to an unacceptable level, or conversely doesn't reduce the false positive rate enough to justify its application. We talked about the possibility of using the SeqQC dataset to actually run an analysis to figure this out. Before I get into doing that though, has anyone tackled this idea before? Is there any statistical treatment out there on the effect of stacking/intersecting different differential expression methods? I haven't come across anything of the sort, but that of course doesn't mean that I didn't miss something relevant.


ADD COMMENTlink modified 22 months ago by Friederike5.0k • written 22 months ago by Adamc620

One of the issues you're going to run into is that these two packages are quite similar in how they work. It's mostly on the periphery of significance that you get discordant results. By taking the intersect you're effectively just taking a more stringent p-value threshold from either package and ignore the fact that you're increasing false negatives. If you really insist on intersecting two packages, then at least use limma with voom, so you could then argue that you're using genes found significant by two different statistical models.

BTW, there are innate problems to intersecting lists of significant genes. Supposing you chose a significance threshold of 0.05, you're then saying that if a gene has a p-value of 0.03 in one package and 0.05 in another that said gene will not be further considered. Does that really make even intuitive sense? It certainly doesn't make statistical sense.

ADD REPLYlink written 22 months ago by Devon Ryan91k

It's mostly on the periphery of significance that you get discordant results

Yes, the original intent was to increase stringency. Although both packages use the negative binomial, distribution, I've occasionally seen cases where DESeq2 and EdgeR assigned fold-changes that went in opposite directions, even where both p-values were significant, which indicated something strange going on with a gene or the dataset that was then worth investigating (or filtered out).

I agree that perhaps it would have made more sense to simply apply a more stringent p-value threshold to one package, instead of intersecting two.

ADD REPLYlink modified 22 months ago • written 22 months ago by Adamc620

But you'd be missing those cases where the tools even disagree on the direction of the change. Are those generally cases where a more stringent p-value would have helped? I have no real insights (meaning: I'm too lazy to dig into this right now) into how closely connected the p-value calculation and the adjustment for the variance are. It's an interesting question though and might be worth posting it on the bioconductor help page in order to directly poke the authors of the respective packages and their thoughts on this.

ADD REPLYlink written 22 months ago by Friederike5.0k

Do you have a background in statistics that allows you to make these kind of assumptions beyond plain intuition? If not, I would strongly recommend not to do these kind of "experimental" analysis but to follow the outstandingly well-documented workflows of edgeR or DESeq2. The point is that, even though it is technically possible what you do, you probably did not do any validation of your results. This is a common problem in any bioinformatical analysis: Changing parameters from the default can dramatically alter the outcome, and without validation of the results, one should really stay with the defaults unless you have expert knowledge. Statistical methods to reduce false differential calls in microarrays/RNA-seq are under constant development/improvement for more than a decade, so I really do not think that a simple intersection can outperform this. If you want to add something new, better make use of additional methods for error correction, such as Michael Love's alpine package to correct GC bias, than using these homebrew methods.

ADD REPLYlink modified 22 months ago • written 22 months ago by ATpoint22k

When I started with analysis of RNA-Seq data there was no clear consensus on what analysis approach was going to become the "gold standard"- both DESeq and EdgeR were new-ish, and it was not like with Affymetrix microarrays where Limma had been the clear choice for years. Hence, this sort-of weak ensemble approach. Retrospectively, of course a more statistically sound technique such as statistically combining p-values would make more sense. Also we always do validation with qPCR on selected genes from a range of foldchanges/p-values to confirm that the qPCR and RNA-Seq results are strongly correlated., although I never intentionally selected genes which were significant by only one of the DE approaches.

What I'm trying to establish now is how much "damage" was or could have been done by using this sort of naive approach- and if this has been adequately answered or addressed somewhere already. I've heard that this sort of intersection thing is not uncommon, and so it seems that a conclusion on this matter could be useful for the community.

ADD REPLYlink written 22 months ago by Adamc620
gravatar for Friederike
22 months ago by
United States
Friederike5.0k wrote:

Very interesting question, which I have not seen explored extensively.

I cannot contribute any statistical insights, but from a practical standpoint I just wanted to add that we also often check the results of DESeq2 with limma-voom and edgeR because we've seen that the overlap tends to be a good indicator for whether we should worry and go back to look at the data more closely or whether we can move ahead. If the overlap is abysmal (especially if edgeR and DESeq2 disagree on a large number of results), that is usually an indication that something funky is going on, which brings out the worst (or the best?) of each method. If the agreement is good, it doesn't really matter which program you go with.

ADD COMMENTlink written 22 months ago by Friederike5.0k

Yes, precisely, if they are showing minimal overlap, then for me it would suggest that there's an issue with the structure of the data and/or the normalisation process.

If you've processed the data correctly using each tool, then any transcripts that are genuinely biologically different should come out as statistically significant in all methods. I would not even be looking at transcripts at the alpha 5% threshold.

ADD REPLYlink written 22 months ago by Kevin Blighe48k

I would not even be looking at transcripts at the alpha 5% threshold.

Are you referring to genes which are "borderline" significant?

ADD REPLYlink written 22 months ago by Adamc620
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 797 users visited in the last hour