Question

Merge enrichement analysis results

0

Entering edit mode

3.8 years ago

rin ▴ 40

Hi all,

I am working with a set of differentially expressed genes in different conditions and in order to understand the results I did various overrepresentation analyses (against Reactome, KEGG and GO terms) and I was wondering if there is a way to "merge" the results all together into unique processes to reduce the redundancy that comes from overlapping processes.

Any idea/tool will be much appreciated!

Thank you!

R enrichment analysis • 1.8k views

ADD COMMENT • link updated 3.8 years ago by demoraesdiogo2017 ▴ 100 • written 3.8 years ago by rin ▴ 40

score 0 · Answer 1 · 2020-06-24

Generally, you should not merge terms between different sets of terms (KEGG, Reactome, GO, etc) together, but there is a package called GOSemSim which tries to collapse redundant GO terms that might be useful for you. GO terms tend to be more egregious offenders of this due to its tiered hierarchy than the pathway terms from KEGG and Reactome, so it should get you at least part of the way there.

clusterProfiler has a function called simplify that will do this for its GO enrichment results quite easily.

score 0 · Answer 2 · 2020-06-24

Another thing you could try is to visualize your enriched sets as a network, in which each node corresponds to a gene set, and they are connected by shared genes. In this way, redundant/overlapping sets will cluster together in your network and you may see a more global picture.

This is what is done, for example, by the EnrichmentMap plugin in Cytoscape. Although it is adapted to the specific output of some enrichment tools, you could build your own input .GMT files if you have the lists of the gene sets.

score 0 · Answer 3 · 2020-06-24

0

Entering edit mode

3.8 years ago

demoraesdiogo2017 ▴ 100

you can do it with the ReviGO webapp for GO terms

http://revigo.irb.hr

If you are doing it for different databases, it is a bit weird, as each database has its own way of building the library.

If you are using EnrichR to get these enrichments, I would recommend building an unique table with all downloadable data, label where each enrichment came from, filter through false discovery rate and order it based on enrichment scores and perhaps focus on the highest scores. This should significantly reduce the number of terms you are looking at and, depending on the size of this table you could hide redundant terms manually (but leave it as supplementary material).

ADD COMMENT • link 3.8 years ago by demoraesdiogo2017 ▴ 100

0

Entering edit mode

I'm not sure, but this made me think that if you merge the tables, before you filter using the adjusted p-values to select pathways, because you're going to compare these tests and because the multiple testing adjustment was probably done for each table separately (and especially depending on the p-adjust method used), shouldn't you re-adjust for multiple testing the p-values using all the p-values in the new "bigger" table?

ADD REPLY • link 3.8 years ago by Papyrus ★ 2.9k

0

Entering edit mode

Now you mention it, I think you are correct, but mostly because the enrichment scores are also calculated based on the p-values, so the enrichment of each library are likely not comparable with each other. I think it is still plausible to filter based on the false discovery rates of each database (the way you mentioned I think it would be possible by modifying EnrichR source code). You could still rank enrichment scores of each table rather than all at once. It really depends on the number of significant enrichments, but it will greatly reduce the number of terms you are working with.

ADD REPLY • link 3.8 years ago by demoraesdiogo2017 ▴ 100