I am currently working on a non-model plant species and after running interproscan, I realized that only a little over half (~38000 out of 63000 genes) get at least one GO term assigned to it. That way, if I were to do a GO enrichment analysis, some gene of interest (say, differentially expressed genes) may not have a GO term associated with it and, I suppose, that information would be lost in the enrichment analysis. So it GO enrichment analysis inherently biased/unreliable for non-model organism? If someone can point to some papers that discuss this that would be very helpful. If I were wrong, please correct me since I am new to this kind of analysis. Thanks in advance!
Yes, this is an important point that is often ignored in plant genomics papers! Here's a paper discussing GO database bias, with a bit on the bias imported by unannotated genes, but this is all in human: https://www.nature.com/articles/s41598-018-23395-2
The other problem is that most of your genes for GO term liftover are Arabidopsis, and that will introduce further bias, there are quite a few Arabidopsis genes which have different functions in close relatives (example: https://www.nature.com/articles/hortres201454 )
As far as I know, there is no package which takes this uncertainty in account, so yeah, if you redo your GO-enrichment analysis from scratch in 10 years chances are you'll get different results.