Ultimate goal: cross-species network alignment for functional prediction. Current problem: mapping BioGRID (ENTREZ_GENE) IDs to/from GO term databases.
I’m trying to produce a GAF or gene2go type file for historical releases of the GO database. On the Gene Ontology Archive (http://archive.geneontology.org/full/), I can’t find any GAF or gene2go files, only SQL databases which are huge and apparently require both SQL and perl to regenerate the GAF files—too much work! So, first question: do there already exist GAF or gene2go files for historical releases?
Then I found EBI’s releases (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old), and they use Uniprot/Swiss-prot names. I have figured out how to automatically map between various naming converntions using the IDENTIFIERS files that are released with BioGRID. However, I often find that there are multiple mappings with very different IDs, so I can’t figure out which BioGRID gene/protein is annotated with which GO terms. Here’s a very specific example:
From the 21 April, 2010 release at ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/UNIPROT/goa_uniprot_gcrp.gpi.174.gz there is the following line:
9606 B5MCF5 GO:0005634 IDA 0 Putative uncharacterized protein STON1-GTF2A1L C
From BioGRID’s IDENTIFIERS files, I find the folowing mappings for B5MCF5:
130414 B5MCF5 UNIPROT-ACCESSION
116226 B5MCF5 UNIPROT-ACCESSION
(I also can’t figure out how to make this editor break the lines rather than paragraphing them, sorry)
Unfortunately those two BioGRID IDs on the left map to two different ENTRE_GENE IDs:
116226 11037 ENTREZ_GENE
130414 286749 ENTREZ_GENE
So, should the above annotation be applied to Entrez gene 11037, or 286749?
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you!