Mapping Gene and Protein names between Uniprot, Swiss Prot, and Entrez
12 months ago
Ultimate goal: cross-species network alignment for functional prediction. Current problem: mapping BioGRID (ENTREZ_GENE) IDs to/from GO term databases.

I’m trying to produce a GAF or gene2go type file for historical releases of the GO database. On the Gene Ontology Archive (http://archive.geneontology.org/full/), I can’t find any GAF or gene2go files, only SQL databases which are huge and apparently require both SQL and perl to regenerate the GAF files—too much work! So, first question: do there already exist GAF or gene2go files for historical releases?

Then I found EBI’s releases (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old), and they use Uniprot/Swiss-prot names. I have figured out how to automatically map between various naming converntions using the IDENTIFIERS files that are released with BioGRID. However, I often find that there are multiple mappings with very different IDs, so I can’t figure out which BioGRID gene/protein is annotated with which GO terms. Here’s a very specific example:

From the 21 April, 2010 release at ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/UNIPROT/goa_uniprot_gcrp.gpi.174.gz there is the following line:

9606     B5MCF5  GO:0005634      IDA     0       Putative uncharacterized protein STON1-GTF2A1L  C


From BioGRID’s IDENTIFIERS files, I find the folowing mappings for B5MCF5:

130414  B5MCF5  UNIPROT-ACCESSION
116226  B5MCF5  UNIPROT-ACCESSION


Unfortunately those two BioGRID IDs on the left map to two different ENTRE_GENE IDs:

116226  11037   ENTREZ_GENE
130414  286749  ENTREZ_GENE


So, should the above annotation be applied to Entrez gene 11037, or 286749?

