What does Uncharacterized Protein Uniprot designation mean?
6 months ago
I am blasting some predicted peptides against the Uniprot database, and many of the hits are "Uncharacterized Protein". How is this designation chosen? i.e. what level of evidence is required for a peptide sequence to be added to the database and given this designation rather than be excluded?

I don't see it described on Uniprot website. I tried to read the publication to see where this term is explained, but there are a crazy number of pubs going back to 1997 (and I can't access that one) https://www.uniprot.org/help/publications


what level of evidence is required for a peptide sequence to be added to the database and given this designation rather than be excluded?

Probably not a lot.

If you look at the history of one such entry https://www.uniprot.org/uniprotkb/Q9H425/history you will see that it was originally added via Trembl. After a number of years it appears to have been seen in a mass spectrometry paper https://rest.uniprot.org/unisave/Q9H425?format=txt&versions=26 Is has stayed in the designation since that time.

You should use the reviewed swiss-prot part of UniProt or better yet use a specific proteome, if possible.

6 months ago

Generally, in a UniProt entry, it is important to look at the evidence label in order to be able to distinguish

  • expert-curated and reviewed annotation
  • automatic annotation
  • information imported from an external database

See https://www.uniprot.org/help/evidences

There are unfortunately many uncharacterized proteins, in particular in UniProtKB/TrEMBL: https://www.uniprot.org/uniprotkb?query=(protein_name:%22uncharacterized%20protein%22)

Example: https://www.uniprot.org/uniprotkb/H3BNH8/entry : It is imported from Ensembl ENSP00000454861 which says "novel protein".

Regarding expert-curated, reviewed entries (i.e. those in UniProtKB/Swiss-Prot):

Protein naming guidelines are available here: https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/International_Protein_Nomenclature_Guidelines.pdf

They do recommend to use "uncharacterized protein" in certain cases as a last resort for novel proteins of unknown function.

See also the news article from UniProt release 2011_04: "The art of defining the unknown", https://www.uniprot.org/help/2011-04-05-release


