Different Protein Go Annotation Results Depending On Their Source: Uniprot, Amigo, Quickgo...
4
5
Entering edit mode
12.1 years ago
Pablo Pareja ★ 1.6k

Hi all,

I was wondering what is the criteria for the selection of Gene Ontology annotations in Uniprot entries.

For instance, the entry with accession P12345 has the following information regarding GO annotations in the XML file:

<dbReference type="GO" id="GO:0005759" key="14">
<property type="term" value="C:mitochondrial matrix"/>
<property type="evidence" value="IEA:UniProtKB-SubCell"/>
</dbReference>
<dbReference type="GO" id="GO:0005886" key="15">
<property type="term" value="C:plasma membrane"/>
<property type="evidence" value="IEA:UniProtKB-SubCell"/>
</dbReference>
<dbReference type="GO" id="GO:0004069" key="16">
<property type="term" value="F:L-aspartate:2-oxoglutarate aminotransferase activity"/>
<property type="evidence" value="ISS:UniProtKB"/>
</dbReference>
<dbReference type="GO" id="GO:0030170" key="17">
<property type="term" value="F:pyridoxal phosphate binding"/>
<property type="evidence" value="IEA:InterPro"/>
</dbReference>
<dbReference type="GO" id="GO:0006457" key="18">
<property type="term" value="P:protein folding"/>
<property type="evidence" value="TAS:HGNC"/>
</dbReference>


However in the entry page we can find the link "GO Complete Annotation" leading to the results found with QuickGo for protein P12345, including many more results (33). Then, the questions is why only 5 references and why particularly those?

Besides that, if we use AmiGO service instead we get 6 results

So I have to say I'm a bit confused about all this, why all these different results? what is the criteria behind it all?

Any ideas?

Cheers,

Pablo Pareja

gene uniprot annotation • 5.3k views
0
Entering edit mode

Pablo could you please fix the link to the AmiGO results? There si a type (missing g) in it, but even with that it doesn't work.

0
Entering edit mode

Pablo could you please fix the link to the AmiGO results? There is a typo (missing g) in it, but even with that it doesn't work.

0
Entering edit mode

Pablo could you please fix the link to the AmiGO results? There is a typo (missing amigo.g).

0
Entering edit mode

done, it should be working now. thanks for pointing it out ;)

5
Entering edit mode
12.1 years ago
Emily Dimmer ▴ 70

Dear Pablo,

The reason you are seeing these different in GO annotations for P12345 is due to the different decisions taken by databases as to which GO annotations they want to display to users. Quite often a gene product has many GO annotations contributed from different manual annotation efforts and automatic annotation methods, with a full GO annotation set sometimes being large and redundant.

AmiGO does not currently display automatic annotation predictions for non-model organism species, therefore the 6 annotations you see for the rabbit protein P12345 have all been manually curated.

In contrast the UniProt group have decided to include most automatic annotation predictions in their web/XML cross-reference display, however they have chosen to reduce the number of GO annotation cross-references displayed in entries by filtering to prefer only those annotations that have applied the most informative GO terms or 'high-quality' manual evidence code.

The full, redundant set of annotations for UniProtKB accessions can be obtained either by downloading the UniProt gene association file, at: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gene_association.goa_uniprot.gz or ftp://ftp.geneontology.org/pub/go/gene-associations/submission/gene_association.goa_uniprot.gz

Or by viewing/downloading subsets of annotations from the QuickGO GO browser (developed by the UniProtKB-GOA group): http://www.ebi.ac.uk/QuickGO/GProtein?ac=P12345

(this is the URL pointed to by the 'Complete GO annotation...' link on the UniProtKB web for P12345 GO annotation).

Best wishes,
Emily

4
Entering edit mode
12.1 years ago

Go is really a tree. The most significant information is present in the most detailed levels, the leaves. If you find a gene on a detailed level you will always find it in the classes that contain that detailed class (the twigs, branches and trunks leading to that class). The meaningful information really is furthest away from the trunk. That is what is causing your problem. UniProt rightly gives only the most meaningful classes, all others can be derived from that.

The differences between AmiGO and UniProt are indeed surprising. You might expect different results caused by different versions of the GeneOntology being used, where AmiGO uses the current version and UniProt the version available at the moment of curation. Since UniProt is really actively curated they may also have decided to leave out a class since the curators considered it wrong (although in that case they should have submitted that information to GO as well).

In reality the UniProt results are more detailed than the AmiGO results. AmiGO seems to fix the results on one level. UniProt for instance gives "mitochondrial matrix" Where AmiGO only finds "mitochondrion". That might also be the reason why AmiGO misses "plasma membrane" entirely. That level is probably not deep enough. AmiGO still has more results, but that is because one class can have multiple parents.

So I would say that UniProt actually does give the most useful information. The problem is not really related to XML, only to different ways to treat the GO hierarchy.

0
Entering edit mode
12.1 years ago
Jerven ▴ 650

Look at the dates, most of the missing GO statements where added 5th of march from compara. While at this time the public uniprotkb release is from February the 8th check again next week and see if the difference is still there.

For the rest you should ask help@uniprot.org for these kinds of questions.

0
Entering edit mode

Thanks for the info, I'll send an email to that address and will let you all know whenever I get an answer.

0
Entering edit mode
4.4 years ago
Neo • 0

These are some of the differences between EBI-GOA (QuickGO) and GO Central (AmiGO) when it comes to entities.

GO Central recommends that GAF annotations are made to genes, that is 1:1 equivalents. In GOA (and consequently in QuickGO) annotations are made to proteins, and there may be multiple proteins per gene, sometimes representing different isoforms. You will see this reflected in different numbers for mouse annotations for example.

This is a very important difference, one that users can see when comparing UIs, but more importantly, it is about the underlying datasets and whether a gene-centric or protein-centric worldview is chosen.

Additionally, GO Central omits the majority of the sequences and IEA [electronic] annotations from UniProtKB from the weekly database builds due to the large size of the data set. For those species with a dedicated authoritative database group, such as Drosophila, mouse or Saccharomyces, UniProtKB annotations are collected and submitted by the dedicated group, and hence the UniProtKB IEA annotations for these species do appear in the GO database. As an NHGRI funded resource, GO Central focuses on annotations that elucidate human genes or genes of relevance to human health in some way. GO Central also includes plants, as well as the 200 genomes of the "Quest for Orthologs" project. More datasets will be supported depending on available resources.