Dealing With Pin Protein Entries Which Have Been Deleted From Uniprot
1
1
Entering edit mode
10.1 years ago
jarle.pahr ▴ 10

Hello. I'm doing some analysis using a human protein interaction network (http://cbg.garvan.unsw.edu.au/pina/download/Homo%20sapiens-20121210.sif) available from PINA (http://cbg.garvan.unsw.edu.au/pina/interactome.stat.do). When I query UniProt for the identifiers in the network, it turns out that a good amount of them belong to now deleted entries. See for example http://www.uniprot.org/uniprot/Q27223 and http://www.uniprot.org/uniprot/Q8NI70. The UniProt FAQ (http://www.uniprot.org/faq/11) explains that deleted entries are most likely caused by the associated nucletotide sequence data being retracted or coming to be recognized as non-coding/a pseudogene. With that in mind, how should I deal with those supposed proteins whose entries have been deleted from UniProt? Is it best to simply delete them from the network, no questions asked?

uniprot • 2.8k views
ADD COMMENT
1
Entering edit mode
10.1 years ago

Your examples include 2 cases:

Q8NI70 was a reviewed (UniProtKB/Swiss-Prot) entry from human. It was deleted in 2011 because it was not considered to be a real protein.

Q27223 was an unreviewed entry from C.elegans. Unfortunately many unreviewed C.elegans entries were deleted from UniProt. Here is some background on this: The C.elegans case is an unusual one in that responsibility/ownership of the genome was originally split between American and European partners and hence submissions were to both GenBank and ENA. All responsibility for submissions is now in Europe and the GenBank entries needed to be transitioned to ENA records. This is very rare and unfortunately processes didn't succeed in mapping identical sequences and hence in UniProtKB, many entries were deleted and new entries made for the same proteins.

If you have any questions about particular deleted entries, please don't hesitate to contact the UniProt helpdesk: help@uniprot.org

ADD COMMENT
0
Entering edit mode

the entry Q8NI70 is deleted in Uniprot, but on the NCBI server is still a valid entry ( http://www.ncbi.nlm.nih.gov/protein/Q8NI70.1?report=girevhist ). Shouldn't the two databases be synchronized?

ADD REPLY
1
Entering edit mode

Thanks for pointing this out, it is indeed surprising. We have brought this to the attention of NCBI staff.

ADD REPLY
0
Entering edit mode

Thank you for the information. I wonder how a C. elegans protein made its made into a supposedly human PIN. I still will welcome suggestions on how to handle such deleted entries in the context of a PIN. Perhaps it could depend on the reason for deletion and if an equivalent entry exists. I imagine there might be many categories or reasons for deletion- is there any way to programatically look up the reason for deletion, or find the equivalent entry if a new one has been made?

ADD REPLY
0
Entering edit mode

There is unfortunately no way to look up the reason for deletion. If a new entry has been linked to an obsolete entry, it will be found by an accession number query. However, if this is not the case, like in many deletions from the unreviewed TrEMBL section, the only way of finding a new corresponding entry is unfortunately a sequence similarity search.

I just downloaded the human PIN set. The first column contains mainly identifiers for human UniProtKB entries, although there are also a few human viruses. The second column contains proteins from various organisms. (I extracted the UniProtKB identifiers from the first and second column, submitted them to a "batch retrieval" on uniprot.org, then viewed the corresponding UniProtKB entries by taxonomy.)

ADD REPLY

Login before adding your answer.

Traffic: 2749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6