Question: Trimming redundant gene sets after gsea analysis
gravatar for fizer
3.2 years ago by
fizer30 wrote:


I am performing gsea with p-values and foldchanges of genes (homosapiens) obtained from rna-seq data. Is it a good idea to do reduction of redundant gene sets afterwards? Because when I plot the results from gsea analysis as a network plot, too many terms that are significant are plotted and plot becomes very hard to read. I know that people do redundant term reduction before or after GO over-representation analysis (hypergeometric test) but I am not sure if it should be done after GSEA type analysis. I want to keep significant term of specific level and remove general term if the term contains >=50% genes as compared to general terms levels. Is there any method available? Suggestions please.

ADD COMMENTlink modified 3.2 years ago by alserg300 • written 3.2 years ago by fizer30

I am not sure for GSEA results...

But for GO enrichment analysis with goseq, I usually remove the too specific and too general terms for plots. I have written a R package called gogadget (gogadget: an R package for go analysis visualization and interpretation ), with a filter function.

But there are more tools available such as REVIGO or GO trimming.

Good luck!

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Benn7.9k
gravatar for alserg
3.2 years ago by
alserg300 wrote:

One of the things I was playing with to reduce redundant gene sets was bayesian-network-like filtering. There is example of it here:

The idea is the following. Let's consider two enriched overlapping pathways p1 and p2. First, let's make a hypothesis that p1 is truely enriched and p2 is just piggybacked to it because of the overlap. You can test this hypothesis by looking at genes unique to p2, that is setdiff(p2, p1). If for these genes you also have enrichment, that the hypothesis is false and you better keep p2. You can also check the other way, whether p1 have some unique enrichment compared to p2. By repeating this operation you can come up with a list of uniquely enriched pathways.

This not only removes redundant pathways, but also it will leave pathways at the most enriched level, which I found useful. However, there several arbitrary thresholds for p-values involved, so one need to be a little careful with interpretation. Otherwise it worked pretty well for me.

ADD COMMENTlink written 3.2 years ago by alserg300
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 654 users visited in the last hour