I am using the gene2go file to obtain go terms and the entrez gene ids related to those go terms.
To make sure that gene2go was complete I had a program count the go terms with tax id 9606 and the result was around 17k . When I ran an sql query on the GO database to see how many GO terms related to human there are the result was around 19k.
I then compared these two datasets. They shared the majority of their terms but not all.
First I thought some of the terms that the database gave me didn't have any genes annotated to them and that is why those terms weren't in the gene2go file.
So I took some terms that are available in the database but not in the gene2go to see if that was the case.
One of those terms was GO:0051503.I ran an sql query on the GO database to see if there were any genes annotated to this term. The sql query is given below:
SELECT term.name, species.ncbi_taxa_id, term.acc, term.term_type, gene_product.symbol AS gp_symbol, gene_product.symbol AS gp_full_name FROM term INNER JOIN association ON term.id=association.term_id) INNER JOIN gene_product ON (association.gene_product_id=gene_product.id) INNER JOIN species ON (gene_product.species_id=species.id) INNER JOIN dbxref ON (gene_product.dbxref_id=dbxref.id) INNER JOIN db ON (association.source_db_id=db.id) WHERE term.acc = 'GO:0051503 ' AND species.ncbi_taxa_id = '9606';
This query returns the gene with symbol SLC25A23 which has entrez gene id 79085.
But when I look at the latest gene2go file there is no row with tax id 9606,go id 0051503 and gene id 79085. The part where this entry should be is as below:
taxID entrez goID 9606 79084 GO:1903508 9606 79085 GO:0002082 9606 79085 GO:0005347 9606 79085 GO:0005509 9606 79085 GO:0005515 9606 79085 GO:0005739 9606 79085 GO:0006851 9606 79085 GO:0015866 9606 79085 GO:0015867 9606 79085 GO:0036444 9606 79085 GO:0043457 9606 79085 GO:0051282 9606 79085 GO:0071277 9606 79085 GO:0097274 9606 79086 GO:0016021
As you can see the goID 0051503 is not there .
Can anyone explain why is that .I tried using a local go database which I set up 2 days ago and I also tried the GOOSE tool of Amigo to run my sql queries.Therefore It can't be explained by database being out of date. The only explanation seems to be that gene2go is incomplete but that doesn't make sense.