Hello,
This is a discussion kind of question. Could you please share your experience with GO analysis tools such as listed in the subject (or others), to help decide which one to use? Here is my experience for the same dataset:
DAVID finds many relevant things (which look relevant from the biological point of view), but several colleagues pointed out that DAVID is using some outdated databases, so I wanted to double check.
GREAT did not find any pathways reported by DAVID (basically, it did not find anything at all for this dataset).
GOrilla did find what DAVID has found, and GOrilla's results seem even more biologically relevant (although, may be this is just my bias about "biological relevance").
So, need your expert advise. Thanks!
I haven't used the others, but yes, don't use DAVID. It is woefully out of date, from what I understand.
I have spent a considerable amount of time investigating DAVID lately. This is because I am at that chapter in my book that talks about GO. After all if it were indeed as out of date as it sounds (it was published 6 years ago) then keeping it running and claiming that is up to date (as the website says) would qualify as a type of scientific misconduct/negligence. But then updating the GO annotations regularly should be very simple, one just needs to set up a cron job to replace file. AFAIK an old method with new data would be fine. Are they actually doing that? Or if so at what frequency?
What I have found is that it is exceedingly difficult to pinpoint the data and the methods that the system is using. The whole thing is surprisingly cryptic. In addition I've come to believe that their summaries and results are very "generous" there is always a lot of stuff there. I suspect that biologists like DAVID because it will always find something: it will support any preconceived notion one might have had.
Its generous in that it ranks returned results but doesn't only return the hits that passed adjusted p-value thresholds. Although I believe the non-adjusted p-values have to be significant. Certainly I have seen it return no results for certain queries in different sections. Personally, while I know there are better tools out there, when biologists have a recommendation I still point them to DAVID because they are primarily using it as a hypothesis generation/exploratory tool anyway. And I'm often using it that way as well. They are then following up with designing qPCR and wet-lab experiments anyway. And in that sense, I want a tool to err on the side of showing me more rather than less. Lots of biologically relevant results from large-scale experiments fail to pass p-value thresholds, especially when those thresholds or the p-values were calculated conservatively.
@Istvan Albert,
This is absolutely true. Recently I've done some GO analysis with other tools but we didn't find what we were looking for. And there we used DAVID at the end, surprisingly found something we were looking. But is it appropriate to stick to this result? Any comments?
@venu and @Dan Gaston as with anything else in the world those that know what they are doing do not need much handholding and protections from being mislead. I for one have minimal respect for p-values anyway since the tiny ones are always unrealistic others can be overly penalized (as pointed out).
I think all these tools need to be used as one piece of evidence rather than the final result that the analysis builds upon.
Also there is a fuzzier issue in play. If one already suspects a well defined event/phenomenon then that event is actually more likely to be true than not (there was probably reason to expect an outcome). At that point one would not actually need to have the p-values corrected for multiple comparisons as if all outcomes were equal. After all one would be checking the likelihood of a single, well defined event. How do we decide who had a well defined idea to begin with and who is out there fishing for ideas ...
Absolutely agree. In my case at least I might only have a vague idea what I might be looking for. I find DAVID quick and fast enough to be useful to get an idea of likely candidate pathways or systems to look at for follow-up.
You may want to try Panther, the enrichment tool on the GO Consortium website is provided via Panther. They also support a wide range of input values.
I emailed the current people on DAVID, they confirmed that there were "intermediate" updates in 2012 and they're currently planning to have an update finished by end of Q1 this year.
Also relevant:
I have a plot with the cumulative amount of information in GO over they years, tells you how much is missing from DAVID
And another plot that I am perhaps less certain of but I feel really captures how citations in bioinformatics work. The citations for DAVID change inversely with the amount of knowledge that DAVID uses.
As a side note, does anybody know of a gene set enrichment package available in R/BioCoductor? I have a pipeline built using DAVID, but I want to switch it out with something using an updated database.
The ClusterProfiler supports Over-representation analysis as well as Set Enrichment Analysis - both of which are (slightly) different methods for analysis of gene lists. Notable feature: you can provide your own database of sets, i.e., analyze other things than just Gene Ontology Terms or Pathways.