Mapping between a NCBI protein and an UniProtKB protein from a FASTA file
6.8 years ago
parism9

I have a FASTA file from using the BioPython BLASTp command programmatically. I want to map each of the entries to their associated UniProtKB (either Swiss or Trembl) (or even to a UniParc / Uniref if needed) entry. For an entry like the following, it seems there is no protein in UniProtKB. Is this true? Or am I missing a way to map between the two? I have used to try to match, but no luck.

>gi|1198293922|gb|OUM52707.1| hypothetical protein BVG19_g1918 [[Candida] boidinii] >gi|1205206951|gb|OWB53426.1| hypothetical protein B5S27_g5022 [[Candida] boidinii] >gi|1205223768|gb|OWB70243.1| hypothetical protein B5S30_g5729 [[Candida] boidinii]

The protein can be referenced here:

When I run a BLASTp search on the UniProtKB page, it only finds a protein with 97% identity, nothing matching this protein.

fasta blast sequence alignment
6.8 years ago
darnells

UniProt has specific guidelines for whether a CDS is considered to be a protein:

The sequence you listed from NCBI is a hypothetical protein; the nr data set is littered with these. This particular sequence must not have passed the experimental or sequence analysis criteria warranted for assignment of a Trembl entry. SWISSPROT entries are manually curated and are experimentally well characterized.


