Mapping between a NCBI protein and an UniProtKB protein from a FASTA file
1
0
Entering edit mode
4.3 years ago
parism9 ▴ 30

I have a FASTA file from using the BioPython BLASTp command programmatically. I want to map each of the entries to their associated UniProtKB (either Swiss or Trembl) (or even to a UniParc / Uniref if needed) entry. For an entry like the following, it seems there is no protein in UniProtKB. Is this true? Or am I missing a way to map between the two? I have used http://www.uniprot.org/uploadlists/ to try to match, but no luck.

>gi|1198293922|gb|OUM52707.1| hypothetical protein BVG19_g1918 [[Candida] boidinii] >gi|1205206951|gb|OWB53426.1| hypothetical protein B5S27_g5022 [[Candida] boidinii] >gi|1205223768|gb|OWB70243.1| hypothetical protein B5S30_g5729 [[Candida] boidinii]


The protein can be referenced here: https://www.ncbi.nlm.nih.gov/protein/OWB83737.1?feature=any

When I run a BLASTp search on the UniProtKB page, it only finds a protein with 97% identity, nothing matching this protein.

fasta blast sequence alignment • 1.1k views
3
Entering edit mode
4.3 years ago
darnells ▴ 30

UniProt has specific guidelines for whether a CDS is considered to be a protein: http://www.uniprot.org/help/cds_protein_definition

The sequence you listed from NCBI is a hypothetical protein; the nr data set is littered with these. This particular sequence must not have passed the experimental or sequence analysis criteria warranted for assignment of a Trembl entry. SWISSPROT entries are manually curated and are experimentally well characterized.