Question: Mapping between a NCBI protein and an UniProtKB protein from a FASTA file
gravatar for parism9
20 months ago by
parism930 wrote:

I have a FASTA file from using the BioPython BLASTp command programmatically. I want to map each of the entries to their associated UniProtKB (either Swiss or Trembl) (or even to a UniParc / Uniref if needed) entry. For an entry like the following, it seems there is no protein in UniProtKB. Is this true? Or am I missing a way to map between the two? I have used to try to match, but no luck.

>gi|1198293922|gb|OUM52707.1| hypothetical protein BVG19_g1918 [[Candida] boidinii] >gi|1205206951|gb|OWB53426.1| hypothetical protein B5S27_g5022 [[Candida] boidinii] >gi|1205223768|gb|OWB70243.1| hypothetical protein B5S30_g5729 [[Candida] boidinii]

The protein can be referenced here:

When I run a BLASTp search on the UniProtKB page, it only finds a protein with 97% identity, nothing matching this protein.

blast alignment sequence fasta • 616 views
ADD COMMENTlink modified 20 months ago by darnells30 • written 20 months ago by parism930
gravatar for darnells
20 months ago by
darnells30 wrote:

UniProt has specific guidelines for whether a CDS is considered to be a protein:

The sequence you listed from NCBI is a hypothetical protein; the nr data set is littered with these. This particular sequence must not have passed the experimental or sequence analysis criteria warranted for assignment of a Trembl entry. SWISSPROT entries are manually curated and are experimentally well characterized.

ADD COMMENTlink written 20 months ago by darnells30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1381 users visited in the last hour