GO analysis: DAVID vs GREAT vs GOrilla
4
9
Entering edit mode
8.2 years ago
biostart ▴ 370

Hello,

This is a discussion kind of question. Could you please share your experience with GO analysis tools such as listed in the subject (or others), to help decide which one to use? Here is my experience for the same dataset:

DAVID finds many relevant things (which look relevant from the biological point of view), but several colleagues pointed out that DAVID is using some outdated databases, so I wanted to double check.

GREAT did not find any pathways reported by DAVID (basically, it did not find anything at all for this dataset).

GOrilla did find what DAVID has found, and GOrilla's results seem even more biologically relevant (although, may be this is just my bias about "biological relevance").

So, need your expert advise. Thanks!

RNA-Seq ChIP-Seq GO gene ontology • 18k views
ADD COMMENT
2
Entering edit mode

I haven't used the others, but yes, don't use DAVID. It is woefully out of date, from what I understand.

ADD REPLY
6
Entering edit mode

I have spent a considerable amount of time investigating DAVID lately. This is because I am at that chapter in my book that talks about GO. After all if it were indeed as out of date as it sounds (it was published 6 years ago) then keeping it running and claiming that is up to date (as the website says) would qualify as a type of scientific misconduct/negligence. But then updating the GO annotations regularly should be very simple, one just needs to set up a cron job to replace file. AFAIK an old method with new data would be fine. Are they actually doing that? Or if so at what frequency?

What I have found is that it is exceedingly difficult to pinpoint the data and the methods that the system is using. The whole thing is surprisingly cryptic. In addition I've come to believe that their summaries and results are very "generous" there is always a lot of stuff there. I suspect that biologists like DAVID because it will always find something: it will support any preconceived notion one might have had.

ADD REPLY
1
Entering edit mode

Its generous in that it ranks returned results but doesn't only return the hits that passed adjusted p-value thresholds. Although I believe the non-adjusted p-values have to be significant. Certainly I have seen it return no results for certain queries in different sections. Personally, while I know there are better tools out there, when biologists have a recommendation I still point them to DAVID because they are primarily using it as a hypothesis generation/exploratory tool anyway. And I'm often using it that way as well. They are then following up with designing qPCR and wet-lab experiments anyway. And in that sense, I want a tool to err on the side of showing me more rather than less. Lots of biologically relevant results from large-scale experiments fail to pass p-value thresholds, especially when those thresholds or the p-values were calculated conservatively.

ADD REPLY
1
Entering edit mode

@Istvan Albert,

biologists like DAVID because it will always find something: it will support any preconceived notion one might have had.

This is absolutely true. Recently I've done some GO analysis with other tools but we didn't find what we were looking for. And there we used DAVID at the end, surprisingly found something we were looking. But is it appropriate to stick to this result? Any comments?

ADD REPLY
0
Entering edit mode

@venu and @Dan Gaston as with anything else in the world those that know what they are doing do not need much handholding and protections from being mislead. I for one have minimal respect for p-values anyway since the tiny ones are always unrealistic others can be overly penalized (as pointed out).

I think all these tools need to be used as one piece of evidence rather than the final result that the analysis builds upon.

Also there is a fuzzier issue in play. If one already suspects a well defined event/phenomenon then that event is actually more likely to be true than not (there was probably reason to expect an outcome). At that point one would not actually need to have the p-values corrected for multiple comparisons as if all outcomes were equal. After all one would be checking the likelihood of a single, well defined event. How do we decide who had a well defined idea to begin with and who is out there fishing for ideas ...

ADD REPLY
0
Entering edit mode

Absolutely agree. In my case at least I might only have a vague idea what I might be looking for. I find DAVID quick and fast enough to be useful to get an idea of likely candidate pathways or systems to look at for follow-up.

ADD REPLY
1
Entering edit mode

You may want to try Panther, the enrichment tool on the GO Consortium website is provided via Panther. They also support a wide range of input values.

ADD REPLY
1
Entering edit mode

I emailed the current people on DAVID, they confirmed that there were "intermediate" updates in 2012 and they're currently planning to have an update finished by end of Q1 this year.

ADD REPLY
3
Entering edit mode

Also relevant:

ADD REPLY
2
Entering edit mode

I have a plot with the cumulative amount of information in GO over they years, tells you how much is missing from DAVID

ADD REPLY
2
Entering edit mode

And another plot that I am perhaps less certain of but I feel really captures how citations in bioinformatics work. The citations for DAVID change inversely with the amount of knowledge that DAVID uses.

ADD REPLY
0
Entering edit mode

As a side note, does anybody know of a gene set enrichment package available in R/BioCoductor? I have a pipeline built using DAVID, but I want to switch it out with something using an updated database.

ADD REPLY
2
Entering edit mode

The ClusterProfiler supports Over-representation analysis as well as Set Enrichment Analysis - both of which are (slightly) different methods for analysis of gene lists. Notable feature: you can provide your own database of sets, i.e., analyze other things than just Gene Ontology Terms or Pathways.

ADD REPLY
5
Entering edit mode
8.2 years ago
Ian 6.0k

Some very simple difference of DAVID and GREAT from a ChIP-seq point of view are:

  • GREAT is restricted to three species (human, mouse and zebrafish)
  • GREAT has the advantage of accepting genome coordinates and will decide on "closest" genes, whereas DAVID requires gene symbols/identifiers as input.
  • ENRICHR is limited to human (I think), but looks very full functioned, and as Goutham pointed out - updated.
ADD COMMENT
0
Entering edit mode

In fact I will echo with Gotham , I often use Enrichr,GOrilla, PANTHER and GREAT and great depending on my need for biological interpretation but yes I am not using them for final assessment, rather for exploratory features. Enrichr to me is the most updated and comprehensive as of now while it supports only few species whereas DAVID does for a lot of them. But yes I am moving out of DAVID slowly and trying to convince people in my wetlab as well to not use it since it is way out-of date for me. Obviously I receive flak of not having significant terms always with other GO tools but atleast they have indicative notions to some important relevant biological terms. I have used GREAT mostly for my ChIP-Seq data but that was not for enrichment purpose rather to associate the distal regulatory elements which is most apt for it, I believe. One can always try a single handed enrichment tool for their purpose or you different GO tools to find an overlap of enrichment terms but it solely relies on the need. Am not sure if the community has a very comprehensive tool as of now. Since my work is mostly restricted to mouse and humans , I am going by PANTHER and Enrichr mostly these days. But if you are looking for any pathway specific stuffs then GO is not the right analytical way for me. Then other pathway enrichment tools are there having much resourceful knowledge base I believe.

ADD REPLY
0
Entering edit mode

It seems the Ontologies tab in the Enrichr is not recently updated, as it gives the output with 2015 at the end of each categories, whereas the Transcription and Pathways tabs giving the output with 2016. Anyone have any idea on this?

ADD REPLY
2
Entering edit mode
8.2 years ago

GREAT is mainly for looking for genes that are enriched for genomic regions. The input for GREAT, for example, coordinates of regulatory regions and look for genes that are associated with those regulatory regions.

I do not know if it mak sense to compare DAVID and GREAT. BTW, what is your input for GREAT and DAVID ?

You also need to check how both of them works. A direct gene enrichment analysis tools such as DAVID/ENRICHR/GOrilla would yield different results than indirect enrichment analysis such as GREAT.

From GREAT website:

GREAT assigns biological meaning to a set of non-coding genomic regions by analyzing the annotations of the nearby genes.

Yes, DAVID looks outdated and I use ENRICHR, which is very very updated and includes many categories, easy to use and pretty fast.

ADD COMMENT
0
Entering edit mode

For DAVID, the input was just a list of gene names. For GREAT, the BED file with coordinates of promoters of the same genes. GREAT could not find anything distantly biologically reasonable, while DAVID did.

I also tried to use GREAT with a large BED file of ChIP-seq peaks, run GREAT, and it said: "Your set hits a large fraction of the genes in the genome, which often does not work well with the GREAT". And then, again, GREAT did not find any GO features.

ADD REPLY
0
Entering edit mode

Even though you give promoters of same genes, those promoters coordinates might be associated with other genes ( GREAT criteria like genes with in 5KB up/downstream, same TAD boundaries etc ), which gives a very large list of genes. So you are comparing two tools with two different gene sets.

The main purpose of GREAT, as far as I know, is to associate the regulatory elements to genes. Not to look for enrichment of gene directly.

ADD REPLY
0
Entering edit mode

It seems the Ontologies tab in the Enrichr is not recently updated, as it gives the output with 2015 at the end of each categories, whereas the Transcription and Pathways tabs giving the output with 2016. Anyone have an idea on this?

ADD REPLY
0
Entering edit mode

I can also see that but at least they are updating on regular basis unlike other tools. Why don't you try real-time based tool Gene Set Clustering based on Functional annotation (GeneSCF) which I developed to overcome these problems. GeneSCF also supports multiple species or organisms from KEGG and Gene Ontology (~4000).

GeneSCF article

Also Check, Nature Methods: Impact of outdated gene annotations on pathway enrichment analysis

ADD REPLY
1
Entering edit mode
8.1 years ago
biostart ▴ 370

I tried ENRICHR, looks very good in terms of available databases. I'd love if it would also have the list of genes in each category (currently missing)

ADD COMMENT
0
Entering edit mode

May I know what do you refer to as list of genes for each category? If am not wrong for GO enrichment once you get the term you can also get from the output what genes are enriched for specific terms from the user specified list.

ADD REPLY
0
Entering edit mode

I'm guessing they want the full gene list associated with each term. enrichr only shows the genes that are also in your gene set.

ADD REPLY
0
Entering edit mode

Try GeneSCF mentioned above. Apart from enrichment analysis, It can download all the genes associated with individual terms as table/plain text (using ‘prepare_database’ module).

ADD REPLY

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6