How to avoid missannotated GO terms?
11 weeks ago
Dr.Animo ▴ 80


I am doing GO enrichment analysis for the newly annotated plant genome. I did BLAST against Swissprot plant proteins and extracted the GO IDs of matching hits. I observed that there are some miss annotated proteins, like the following:

P04145 (Assigned GO term peribacteroid membrane)

P84795 (Assigned GO term blood coagulation)


There are multiple entries like this. My question is how to avoid these misannotations, or is there any plant-specific GO terms list available?


swissprot GO enrichment
11 weeks ago

Please contact the UniProt helpdesk whenever you find such annotations, especially in these cases where the GO evidence/source tag says "UniProtKB-Subcell" or "UniProtKB-Keyword". I think you have done it because we were contacted about P04145, but I write this for all other BioStar users who may read this.

Both cases are edge cases, and we are looking into them:

P84795 has this "Function" annotation which explains the keyword "blood coagulation": Potent inhibitor of serine proteases plasma kallikrein, plasmin and coagulation factor XIIa. Weak inhibitor of serine proteases trypsin and coagulation factor Xa. I will bring the question of the taxonomic scope of "blood coagulation" to the attention of curators.

P04145 has subcellular location "peribacteroid membrane". If you read the definition of this term and how it is used in UniProt,, you will find "Symbiosis leads to the formation of a new compartment in the plant cell when bacteria enter the plant cell by endocytosis, the symbiosome. ". It is therefore not excluded to find this term in plants. One of our biocurators has been looking into this for P04145 and will contact you with the outcome.

Please don't hesitate to send us the full list of entries where you have doubts about the GO terms / Keywords / Subcellular location.

11 weeks ago

The problem most likely is with the blast alignment.

BLAST is a local aligner, getting a hit to another sequence does not mean that the entire sequence matches or that the functional annotations can be directly inferred from the blast hit alone.

Now how to figure out which function cannot exist for a species - i.e. blood coagulation should not be assigned to any plant - that is a different question.

What I would do is to select the proper background for your enrichment study.

That is select a closely related plant species and use those genes/functions as the background rather than all.

These kind of missannotations are assigned to multiple plant proteins except few like Arabidopsis thaliana and Zea maize. The problem is if I only use one of these then I will probably lose those terms which are not assigned to these plants but they are plant GO terms.


