Hi everyone,
I’m new to proteomics and currently analyzing baseline plasma proteomics data from a clinical trial to identify treatment-related biomarkers.
According to the protein identification report, about 4,000 proteins were detected, but only around 900 proteins have annotated PG (protein group) gene names.
I performed a standard proteomics screening workflow using protein names as identifiers, and obtained around 200 potentially important proteins (p < 0.05) — though only 2 proteins remained significant after FDR correction (FDR < 0.05).
However, among these candidates, only about 30 proteins can be mapped to a gene name, while the majority are TrEMBL entries (e.g., UniProt IDs starting with A0A5…). These entries cannot be mapped to gene symbols in UniProt, which makes them unusable for pathway enrichment or gene-level analyses. Some have similar sequences (~90% identity) to reviewed Swiss-Prot entries, but annotation confidence is low.
I have a few questions:
- How are such TrEMBL / unreviewed proteins typically handled in proteomics studies? Should I keep only the proteins that have gene-level annotation (Swiss-Prot reviewed entries)?
- Or should I try to annotate TrEMBL entries via sequence similarity (e.g., 90% identity match or BLAST against the human nr database)?
- For plasma proteomics, since many proteins are secreted and potentially functional, is it reasonable to experimentally test some of these TrEMBL proteins? For example, we plan to test whether some candidate proteins can enhance the effect of therapy. If we proceed, we would synthesize the protein from the UniProt FASTA sequence, though commercial reagents (antibodies, recombinant proteins) are rarely available. Has anyone used such TrEMBL-only proteins for experimental validation or biomarker discovery?
Any advice, tools, or examples of how others handle these "uncharacterized" UniProt entries in discovery proteomics would be greatly appreciated.
Thank you!
Note that UniProtKB accession numbers do not have any meaning. While most ACs starting with A0A5* are indeed unreviewed (i.e. from TrEMBL), an increasing number of them will be observable in Swiss-Prot, the reviewed section as you can see in this query https://www.uniprot.org/uniprotkb?dir=ascend&query=reviewed%3Atrue&sort=accession
Accession number documentation can be found at https://www.uniprot.org/help/accession_numbers
When entries are reviewed by UniProt expert biocurators, they move from TrEMBL into Swiss-Prot and keep their accession numbers. This doesn't answer your question of course, but I wanted to set this straight.
What database did you use to search against? Perhaps you are using an old (not so current) database.