Hello. I'm doing some analysis using a human protein interaction network (http://cbg.garvan.unsw.edu.au/pina/download/Homo%20sapiens-20121210.sif) available from PINA (http://cbg.garvan.unsw.edu.au/pina/interactome.stat.do). When I query UniProt for the identifiers in the network, it turns out that a good amount of them belong to now deleted entries. See for example http://www.uniprot.org/uniprot/Q27223 and http://www.uniprot.org/uniprot/Q8NI70. The UniProt FAQ (http://www.uniprot.org/faq/11) explains that deleted entries are most likely caused by the associated nucletotide sequence data being retracted or coming to be recognized as non-coding/a pseudogene. With that in mind, how should I deal with those supposed proteins whose entries have been deleted from UniProt? Is it best to simply delete them from the network, no questions asked?
Your examples include 2 cases:
Q8NI70 was a reviewed (UniProtKB/Swiss-Prot) entry from human. It was deleted in 2011 because it was not considered to be a real protein.
Q27223 was an unreviewed entry from C.elegans. Unfortunately many unreviewed C.elegans entries were deleted from UniProt. Here is some background on this: The C.elegans case is an unusual one in that responsibility/ownership of the genome was originally split between American and European partners and hence submissions were to both GenBank and ENA. All responsibility for submissions is now in Europe and the GenBank entries needed to be transitioned to ENA records. This is very rare and unfortunately processes didn't succeed in mapping identical sequences and hence in UniProtKB, many entries were deleted and new entries made for the same proteins.
If you have any questions about particular deleted entries, please don't hesitate to contact the UniProt helpdesk: firstname.lastname@example.org