Enrichment analysis software examples
2
2
Entering edit mode
4.7 years ago
s.may-wilson ▴ 50

I have recently began attempting to conduct an enrichment analysis with the intention of finding whether any pathways or biological functions are over-represented in a gene-set I have. Part of my analysis pipeline is creating two different models for predicting the expression levels of a specific list of genes. Thus, I have a subset of this gene list where one model significantly outperforms the other and I want to perform the enrichment analysis on this subset.

I've tried to use FUMA-GWAS (https://fuma.ctglab.nl/) and ConsensusPathDB (http://cpdb.molgen.mpg.de/) both of which seem like excellent tools for almost exactly what I'm trying to do with the issue that they both work by comparing my subset against all genes in humans (this is technically not the case with FUMA, but I have had some issues using it). This would be fine except that my initial gene list is not randomly chosen, so any enrichment analysis conducted against all genes might just show enrichment of my overall gene list rather than the subsetted list. Therefore I need to use a tool which allows the use of a background list of genes for my overall gene list.

Thus far I have drawn a blank on online resources which might allow this sort of analysis, for example DAVID (https://david.ncifcrf.gov/home.jsp) also does not seem to work with a background list of gene names. So my question is whether there are any other good online resources which might fit these requirements.

I'm aware of the existence of Cytoscape and GSEA, but was hoping there might be something simpler and easily accessible online than a downloaded software package.

enrichment analysis pathway gsea FUMA GWAS • 1.8k views
ADD COMMENT
2
Entering edit mode
4.7 years ago

I would would recommend trying out Enrichr or GATHER:

https://amp.pharm.mssm.edu/Enrichr/

https://changlab.uth.tmc.edu/gather/

However, those don't specify a background set. While that often doesn't prevent getting useful results, (in addition to DAVID) you can specify a background set in IPA (but that is commercial software, and the links above are for free programs).

ADD COMMENT
1
Entering edit mode

EnrichR certainly looks rather nicely laid out and like it might be quite useful. GATHER doesn't look quite as fancy but also seems to provide some interesting information as well. Both seem like decent tools to add to what I've already tried, so thanks for the suggestions!

I think the main issue though is that without the background list, EnrichR in particular is returning very similar results to FUMA when conducted without a background list (not particularly surprising I suppose as they all rely on the same online resources to generate results). While useful, it doesn't solve the main issue that without the background list the results are a bit meaningless.

For example, both FUMA and EnrichR suggest that my subset of genes are enriched for immune response genes, but that is very plausibly because the original gene list was pretty enriched for genes related to immune response.

ADD REPLY
0
Entering edit mode

If you are something like an nCounter experiment (or targeted gene panel), then you have a good point.

While certain library preparation methods (or array designs) might also benefit from a background adjustment, I think these can be useful programs for hypothesis generation. Plus, certain programs like goseq can be automated for relatively quick results, but I usually find Enrichr / GATHER often provides better results than goseq (even though that other program is supposed to be more specifically designed for sequencing experiments).

For example, even with a limited set of genes, you might want to get an idea about some annotations for even 2-3 related genes (even though that can otherwise have a non-trivial false discovery rate). However, if all your tested genes are "immune" genes and a general category like "immune genes" is enriched, then you would either have to look for lower-ranked enrichment and/or use another program.

ADD REPLY
1
Entering edit mode
4.7 years ago

You definitely need to use a custom background list that reflects your preselection of genes.
GSEA should allow you to create a custom gene set but you'll have to dig through the docs to find out how to do it. In R, the topGO package also allows you to work with a custom background gene set.
Another option is to implement overrepresentation tests yourself e.g. in R, most are based on the hypergeometric distribution.

ADD COMMENT
0
Entering edit mode

Agreed. The analysis without the background list is relatively interesting, but not particularly meaningful.

I have to say I was hoping there would be some online resources I could use rather than having to run anything on R or download packages, but now I reckon it seems a bit unlikely as I couldn't find anything myself and the resources I could find all don't allow background lists.

Using topGO sounds like a decent solution. As well as that thought there has been a recent paper which shows how to carry out a series of functional enrichment tests (https://www.nature.com/articles/s41596-018-0103-9) so I might follow the pipeline they've created.

For the moment I intend to wait a little longer just in case I get lucky!

ADD REPLY

Login before adding your answer.

Traffic: 3213 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6