Question

Bias in GO enrichment analysis for non-model organsim?

1

Entering edit mode

4.8 years ago

yh362 ▴ 50

I am currently working on a non-model plant species and after running interproscan, I realized that only a little over half (~38000 out of 63000 genes) get at least one GO term assigned to it. That way, if I were to do a GO enrichment analysis, some gene of interest (say, differentially expressed genes) may not have a GO term associated with it and, I suppose, that information would be lost in the enrichment analysis. So it GO enrichment analysis inherently biased/unreliable for non-model organism? If someone can point to some papers that discuss this that would be very helpful. If I were wrong, please correct me since I am new to this kind of analysis. Thanks in advance!

gene ontology GO enrichment topGO interproscan • 1.3k views

ADD COMMENT • link updated 4.8 years ago by Philipp Bayer 8.4k • written 4.8 years ago by yh362 ▴ 50

score 0 · Answer 1 · 2019-07-15

Yes, this is an important point that is often ignored in plant genomics papers! Here's a paper discussing GO database bias, with a bit on the bias imported by unannotated genes, but this is all in human: https://www.nature.com/articles/s41598-018-23395-2

The other problem is that most of your genes for GO term liftover are Arabidopsis, and that will introduce further bias, there are quite a few Arabidopsis genes which have different functions in close relatives (example: https://www.nature.com/articles/hortres201454 )

As far as I know, there is no package which takes this uncertainty in account, so yeah, if you redo your GO-enrichment analysis from scratch in 10 years chances are you'll get different results.