I was writing this at the same time as Julian was adding his useful comment. I am also a Swiss-Prot annotation fan but I can confirm that there are constitutive problems for ID x-mapping in general for a significant proportion of human proteins as indicated by the following UniProt queries
(organism:"Homo sapiens ") AND reviewed:yes = 20,237
(organism:"Homo sapiens ") AND reviewed:yes AND database:(type:ensembl) = 18,685
(organism:"Homo sapiens ") AND reviewed:yes AND database:(type:ensembl) AND database:(type:hgnc) AND database:(type:geneid) = 18,250
Ensembl 67.37 = 21,065 including 568 novel (i.e. not 100% match to UniProt)
The Biomart numbers should be similar but any way you look at it there is ~ 8% discordance Swiss-Prot > Ensembl and residual for HGNC and EGID. The numbers also indicate ~ 1000 Ensembl proteins are not in Swiss-Prot (but some may be in TrEMBL)
For Q9Y5I3 it looks like the flat file had the x-ref but not the UniProt web interface (i.e. I can click UniProt > HGNC > Ensembl but not direct) Maybe this is the sync problem Julian points out.
Julian, can you get PICR nos that are concordant with the type I have shown ?
modified 7.8 years ago
7.8 years ago by
cdsouthan • 1.8k