Where to find *full* GO gene annotation file?
1
0
Entering edit mode
5.1 years ago
iamjli ▴ 10

Hi,

I've been struggling with finding a full, unfiltered GO annotation file (GAF). A number of different GO enrichment sites point to here to download the GAF file: http://www.geneontology.org/page/download-annotations

But this is apparently not complete. For instance, when I use GOrilla, GO:0010604 appears as an enriched pathway. However, GO:0010604 does not show up in the file from the gene ontology website.

What am I missing here? Where can I find the full file?

Thanks!

gene ontology GAF GO enrichment • 3.8k views
0
Entering edit mode

I do find GO:0010604 (positive regulation of macromolecule metabolic process) in Amigo and the Ontology Lookup Service so I wonder which gene ontology website you are referring to.

0
Entering edit mode

Yes, I'm wondering why AmiGO has it, but not the GAF file. I am referring to this file in particular: http://geneontology.org/gene-associations/goa_human.gaf.gz

1
Entering edit mode
5.1 years ago

Make sure that you distinguish between an annotation and a definition. The GO:0010604 is a term definition that is present in the GO definition file.

wget http://purl.obolibrary.org/obo/go.obo


check the term of interest:

cat go.obo | grep -A 3 "id: GO:0010604"


produces:

id: GO:0010604
name: positive regulation of macromolecule metabolic process
namespace: biological_process
def: "Any process that increases the frequency, rate or extent of the chemical reactions and pathways involving macromolecules, any molecule of high relative molecular mass, the structure of which essentially comprises the multiple repetition of units derived, actually or conceptually, from molecules of low relative molecular mass." [GOC:dph, GOC:tb]


Now this term may not be present in an annotation file if none of the gene products have been annotated with this term.

0
Entering edit mode

Yes, I understand the difference between the two files. Why does GO:0010604 appear in AmiGO and GOrilla, but not in the .gaf file (this is the one I'm referring to: http://geneontology.org/gene-associations/goa_human.gaf.gz)

1
Entering edit mode

Ok, now I understand what you mean.

The GAF file is complete and non-redundant. It contains the minimal number of annotations necessary to annotate the data. So for example GO:0045893 is a GO:0045893 that in turn is a GO:0010604.

When the annotation file contains GO:0045893 it also means that the annotated genes are also annotated with the ancestors of this term. But these entries will not be entered in the file. Tools like AmiGO will search not just the leaf nodes but intermediate ones in case there is support for those.

0
Entering edit mode

Gotcha, thanks for the info, Istvan. I guess my question is then where can I download the annotations that AmiGO uses? I see they have an option to download filtered searches, but I'm looking for the entire GAF file.

0
Entering edit mode

AmiGO uses the same annotations - it is when they parse it they build a data model in the program itself - and that is the service that they offer.