Where/how do proteins with "-like" in their names get their names?
2.3 years ago
Dunois

Perhaps the title of my post is misleading, but for instance consider this "Period-like" protein. According to this link:

Proteins of unknown function which exhibit significant sequence similarity to a defined protein family have been named in accordance with other members of that family.., e.g. "Holliday junction resolvase family endonuclease".

It is also possible to use "-like" in the name. Bear in mind that this should only be used for cases that are outliers to a tight homomorphic family, e.g. "Holliday junction resolvase-like protein".

(Emphasis mine.)

So I could presume that the "Period-like" protein was named so because it exhibited "significant" similarity to the Period protein family, but its function was not/has not been (experimentally?) confirmed.

However, this reductase here was also probably assigned a name based on sequence similarity. UniProt indicates it was also "predicted" just like the Period-like protein; as evidenced by the Status line on the UniProt webpage. It is probable that this particular protein does not have any additional evidence backing its identification either. So why wasn't this named "NADH:ubiquinone reductase-like" instead?

In general I would like to know under what circumstances a protein gets suffixed with the "-like" moniker (and when it wouldn't). Additionally, when one encounters a "X-like" protein, which properties of the protein can one presume to be related to the family that protein is purported to belong to? (Fold? Function?)

(I apologize if this is something that is supposed to be obvious and/or trivial.)

Edit: this NCBI link here does explain the usage of the "-like" suffix in the context of equivalog-type HMMs but the first and last points under that section appear to contradict one another somewhat? (First point states equivalogs are homologs that share a specific function but whose evolutionary relationship is unknown, but the last point claims despite obvious sequence similarity to XXX, it may or may not have the same role and function as XXX.)

I could be wrong, but in my experience annotation with "-like" doesn't appear to follow an overly systematic/objective process. I don't think there are obvious criteria such as "ABC-like proteins must be 30-50% identical to ABC proteins, and lack experimental evidence". I think its a slightly more holistic way of 'organising' the information.

So I could presume that the "Period-like" protein was named so because it exhibited "significant" similarity to the Period protein family, but its function was not/has not been (experimentally?) confirmed.

That is a good assumption. It is a place-holder an annotator placed on the name when they can't be reasonably certain that the protein is what they think it is. All indications point the new protein having that function/characteristics/fold but it remains a hypothesis, until someone experimentally proves it to be so.

2.3 years ago

In a UniProt entry, it is important to look at the evidence label in order to be able to distinguish

• expert-curated and reviewed annotation
• automatic annotation
• information imported from an external database

Looking at the entries you cite:

https://www.uniprot.org/uniprot/A0A075IMJ3 is an unreviewed entry (i.e. in UniProtKB/TrEMBL), and the protein name has been imported from nucleotide sequence database entry EMBL:AIF31262.1. Since ENA/GenBank/DDBJ is an archive, protein names and other annotations are provided directly by the submitters, without major intervention by a curator. The submitter may or may not have made the effort to follow the international protein naming guidelines that you found.

https://www.uniprot.org/uniprot/A0A2D1GRS1 is also an unreviewed entry, but the protein name was assigned by an automatic annotation pipeline, ARBA, in this case (https://www.uniprot.org/help/arba). You can click on the ARBA rule name in the evidence tag for more information: https://www.uniprot.org/arba/ARBA00012944, and you will see that the name is based on the EC number EC:7.1.1.2. You can also consult the entry history to find out how the name evolved, in particular what the submitted name was before automatic annotation: https://www.uniprot.org/uniprot/A0A2D1GRS1?version=6&version=7&diff=true

If you want to see reviewed entries with a protein name "-like", try this query: https://www.uniprot.org/uniprot/?query=name%3A*like+reviewed%3Ayes&sort=score

Note that many of the "-like" names are not the recommended names, but can be found in alternative names.

Don't hesitate to contact the UniProt helpdesk if you have any additional questions about UniProt.

@Elisabeth Gasteiger thank you for the detailed explanation of what's happening behind the curtain regarding protein naming. It's super useful (and super nice) to have an answer from UniProt directly, and also heartening to see the massive work being put into making UniProt a comprehensive and user-friendly database.

So I presume the tl;dr version of "-like" naming boils down to either submitter preferences (conventions considered or otherwise) or assignment by an automated annotation pipeline (which I presume would be much more consistent).

As an aside here, do imported protein names (such as the one in my example) get reviewed (and potentially re-annotated) by ARBA?

If you want to see reviewed entries with a protein name "-like", try this query: https://www.uniprot.org/uniprot/?query=name%3A*like+reviewed%3Ayes&sort=score

So searching with the name field also searches through all the alternative names of the entry?

"As an aside here, do imported protein names (such as the one in my example) get reviewed (and potentially re-annotated) by ARBA?"

Yes. At every release, sequences go through the automatic annotation pipelines (see also https://www.uniprot.org/help/automatic_annotation) and if the conditions of a rule match, the corresponding annotations are applied to the entry. This happened for A0A2D1GRS1 for which I showed you the history link above: https://www.uniprot.org/uniprot/A0A2D1GRS1?version=6&version=7&diff=true - the submitted name was replaced by a "recommended name" assigned by ARBA.

"So searching with the name field also searches through all the alternative names of the entry?"

yes, all names are searched.