Entering edit mode
7.0 years ago
firestar
★
1.6k
I detect a few hundred significantly (qval<0.05) DE genes. When I perform an enrichment analysis using these genes on GO network (using various tools), I do find around 50 or so non-significant GO terms. But, I find no significant GO terms (qval < 0.05). What might be the reasons for this? Anything that could be tweaked?
By "qval" I assume you mean FDR adjusted p-value. This may be one of the key reasons why your GO analysis is not returning anything significant. FDR is not the best method when looking at ontologies. Read on to see why...
Due to the True Path Rule, genes associated with a GO term are also associated with its parent terms (for more on this, see Chapter 22 of Dr. Draghici’s book [7]). This means that simply performing an enrichment analysis for each GO term will count each gene many times, which is a serious problem (see Draghici, Chapter 24). Furthermore, testing the enrichment of all GO terms is not necessary and due to the unavoidable multiple comparison curse will increase the number of false positives reported. Luckily, one can leverage the structure and additional properties of GO in order to limit the number of tests performed, and therefore the number of comparisons one must correct for. In 2006, Alexa [8] proposed two methods to accomplish this: “Elim” and “Weight.”
For example, in iPathwayGuide and iVariantGuide we offer both methods, each of which follow the same outline.
Elim
The Elim method assesses the significance of GO terms starting with the most specific terms first. The benefit of this approach is that it is easier to find specialized terms that are significant, e.g. "response to amphetamine" is more descriptive than "response to chemical.” This approach provides a very nice custom cut through the GO hierarchy that “magically” identifies the lowest level of abstraction that contains the significant GO terms in the given experiment.
Weight
Given a set of related GO terms, the Weight method is designed to identify the term that best represents the genes of interest, regardless of where the term falls in the hierarchy. This approach is less stringent than Elim, capturing more true positives with the drawback of including additional false positives.
References
Yes. by qvalue I mean, Benjamini-Hochberg adjusted p-value which I think is the same as FDR adjusted p-value. But, this is some new insight. Thanks for this.
Why do you have to find significantly enriched GO terms ? There are many reasons why no significant enrichment is found and that is a perfectly acceptable result. GO annotations incompletely capture current knowledge. If you're looking at something new, genes may not be well annotated with the corresponding terms and so you will most likely not see any enrichment because usual approaches favour gene sets with many annotations (see this paper). Also, your threshold of 0.05 is entirely arbitrary. What if you had terms with a q-value of 0.0509 ? The values you get also depend on the approach to enrichment analysis you take. Many of these make many unnecessary tests that reduce the detection power. For example, if your experiment is only concerned about cellular functions, you don't need to test for GO terms such as foraging behaviour i.e. in general, terms that are not below the cellular process term. In addition many methods make unnecessary test by not taking into account redundancies in the annotations (see this paper).
Hmmm.. I am looking for significantly enriched or depleted GO terms because I don't want it to be by chance. If I take some random genes there is a chance that I am going to get some GO terms. But, is that reliable? I understand 0.05 is arbitrary, but my qvalues for the returned GO terms are 0.8, 0.9 etc, so even if I relaxed 0.05 to 0.06 or even 0.1, it wouldn't make any difference. Another reason is that I have two datasets (different tissues) and this issue is only with one of them. As you say there are different implementations of enrichment analyses, I have tried a few different ones (DAVID, ClusterProfiler,goana,ClueGO etc). Although the list of GO terms differ quite a bit, none of them are significant.
I don't know about others but, each time that I do gene enrichment analysis, I come back disappointed by the results that mostly never make sense and that provide for more confusion. Do any unbiased gene enrichment on a large chunk of genes and cancer and immune pathways always come back. One thing that equally worries me is that I have heard how some people even in clinical settings are forming conclusions based on in silico gene enrichment results.
If you cannot even get a significant enrichment term, then I would suggest doing a manual literature search and switch off autopilot for the remainder of your study. I don't mean to be critical or anything, but I have fundamental doubts about gene enrichment based on my own and others' experiences. I neither want to sound old (mid 30s), but I remember the days when we had to do literature searches and it was actually fun trying to piece together the jigsaw. I really worry about how technology is attempting to replace our creative brains.
Your reply reminds me of an email I got sometime back from the author of a popular GO enrichment tool. Quote:
"But it's rare that anybody interprets GO enrichment analysis as being actionable information. They tend to run it, mention it in their paper as justifying the results as being "reasonable", and then pretty much ignore it when they plan their next experiments. It beats staring at the gene list, but it's actually not all that useful since it depends heavily on what GO has decided to annotate - which is not static."
Could be true what s/he says!
Need some more details. Choice of tool matters depending on the type of organism you are looking for.
1) What are the tools you used. 2) Parameters used in individual tool.