How to handle TrEMBL proteins without gene annotation in plasma proteomics?
2
0
Entering edit mode
15 hours ago
Luwell • 0

Hi everyone,

I’m new to proteomics and currently analyzing baseline plasma proteomics data from a clinical trial to identify treatment-related biomarkers.

According to the protein identification report, about 4,000 proteins were detected, but only around 900 proteins have annotated PG (protein group) gene names.

I performed a standard proteomics screening workflow using protein names as identifiers, and obtained around 200 potentially important proteins (p < 0.05) — though only 2 proteins remained significant after FDR correction (FDR < 0.05).

However, among these candidates, only about 30 proteins can be mapped to a gene name, while the majority are TrEMBL entries (e.g., UniProt IDs starting with A0A5…). These entries cannot be mapped to gene symbols in UniProt, which makes them unusable for pathway enrichment or gene-level analyses. Some have similar sequences (~90% identity) to reviewed Swiss-Prot entries, but annotation confidence is low.

enter image description here

I have a few questions:

  1. How are such TrEMBL / unreviewed proteins typically handled in proteomics studies? Should I keep only the proteins that have gene-level annotation (Swiss-Prot reviewed entries)?
  1. Or should I try to annotate TrEMBL entries via sequence similarity (e.g., 90% identity match or BLAST against the human nr database)?
  1. For plasma proteomics, since many proteins are secreted and potentially functional, is it reasonable to experimentally test some of these TrEMBL proteins? For example, we plan to test whether some candidate proteins can enhance the effect of therapy. If we proceed, we would synthesize the protein from the UniProt FASTA sequence, though commercial reagents (antibodies, recombinant proteins) are rarely available. Has anyone used such TrEMBL-only proteins for experimental validation or biomarker discovery?

Any advice, tools, or examples of how others handle these "uncharacterized" UniProt entries in discovery proteomics would be greatly appreciated.

Thank you!

biomarker uniprot proteomics annotation trembl • 180 views
ADD COMMENT
1
Entering edit mode

Note that UniProtKB accession numbers do not have any meaning. While most ACs starting with A0A5* are indeed unreviewed (i.e. from TrEMBL), an increasing number of them will be observable in Swiss-Prot, the reviewed section as you can see in this query https://www.uniprot.org/uniprotkb?dir=ascend&query=reviewed%3Atrue&sort=accession

Accession number documentation can be found at https://www.uniprot.org/help/accession_numbers

When entries are reviewed by UniProt expert biocurators, they move from TrEMBL into Swiss-Prot and keep their accession numbers. This doesn't answer your question of course, but I wanted to set this straight.

ADD REPLY
1
Entering edit mode

I performed a standard proteomics screening workflow

What database did you use to search against? Perhaps you are using an old (not so current) database.

ADD REPLY
2
Entering edit mode
7 hours ago

Apparently you are working with human proteins? I would strongly recommend looking only at proteins that are part of the human reference proteome https://www.uniprot.org/proteomes/UP000005640

I just checked the first AC in your list (as you have them in an image I cannot easily copy/paste your list) and I see that at least the first one

https://www.uniprot.org/uniprotkb/A0A5C2G3P6

is not part of the proteome. Only a few hundred proteins from the human proteome have no gene names: https://www.uniprot.org/uniprotkb?query=reviewed%3Afalse+NOT+gene%3A*+AND+proteome%3AUP000005640

Please note that with the exception of selected entries that include biologically important information, all entries that are not part of reference proteomes will soon be deleted from TrEMBL: https://insideuniprot.blogspot.com/2025/06/capturing-diversity-of-life.html

ADD COMMENT
0
Entering edit mode
8 hours ago
Aleksandra ▴ 160

For basic analysis, discard anything that isn't from Swiss-Prot. Your statistics and pathway analysis should only be based on reliable, annotated proteins. Attempting to manually attach annotations to TrEMBL hits via BLAST is futile. This method is not reproducible and makes it look like you're cherry-picking data. Also, what about that significant TrEMBL hit with no name? That could be your potential breakthrough. Before spending money on antibodies, run its sequence through InterProScan to find domains and Phobius to check if it's secreted. If the results make sense, you may have a real candidate worth pursuing in the lab. In a nutshell, get your analysis in order and then start treasure hunting in the TrEMBL list.

ADD COMMENT

Login before adding your answer.

Traffic: 3501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6