Question: Mapping between a NCBI protein and an UniProtKB protein from a FASTA file
0
gravatar for parism9
14 months ago by
parism930
parism930 wrote:

I have a FASTA file from using the BioPython BLASTp command programmatically. I want to map each of the entries to their associated UniProtKB (either Swiss or Trembl) (or even to a UniParc / Uniref if needed) entry. For an entry like the following, it seems there is no protein in UniProtKB. Is this true? Or am I missing a way to map between the two? I have used http://www.uniprot.org/uploadlists/ to try to match, but no luck.

>gi|1198293922|gb|OUM52707.1| hypothetical protein BVG19_g1918 [[Candida] boidinii] >gi|1205206951|gb|OWB53426.1| hypothetical protein B5S27_g5022 [[Candida] boidinii] >gi|1205223768|gb|OWB70243.1| hypothetical protein B5S30_g5729 [[Candida] boidinii]
MALAKAASINDDIHDLTMRAFRCYVLDLVEQYEGGHPGSAMGMVAMGIALWKYTMKYSPNDATWFNRDRFVLSNGHVCLFQYLFQHLSGLKSMTEKQLKSYHSSDYHSKCPGHPEIENEAVEVTTGPLGQGISNSVGLAIASKNLGALYNKPGYEVVNNTTYCIVGDACLQEGPALESISFAGHLGLDNLIVIYDNNQVCCDGSVDIANTEDISAKFRACNWNVIEVEDGARDVATIVKALELAGAEKNRPTLINVRTIIGTDSAFQNHCAAHGSALGEEGVRELKIKYGFNPSQKFHFPQEVYDFFADLPAKGDEYVSNWKKLVSSYVKEYPELGAEFQARVRGELPKNWKSLLPNELPSEDTATRTSARAMVRAFAKDVPNVIAGSADLSVSVNLPWPGSKYFENPQLATQCGLAGDYSGRYVEFGIREHCMCAIANGLAAFNKGTFIPITSSFYMFYLYAAPALRMAALQELKAIHIATHDSIGAGEDGPTHQPIAQSALWRAMPNFYYMRPGDASEVRGLFEKAVELPLSTLFSLSRHEVPQYPGKSSVELAKRGGYVFEDAKDADVQLIGAGSELEQTVKTARLLRSRGLKVRILSFPCQRLFDEQSVGYRRSVLQRGKVPTVVIEAYVAYGWERYATAGYNMNTFGKSLPVEDVYEYFGFNPSEISKKIEGYVRAVKSNPDLLYEFIDLKEKPKHDQNHL

The protein can be referenced here: https://www.ncbi.nlm.nih.gov/protein/OWB83737.1?feature=any

When I run a BLASTp search on the UniProtKB page, it only finds a protein with 97% identity, nothing matching this protein.

blast alignment sequence fasta • 534 views
ADD COMMENTlink modified 14 months ago by darnells30 • written 14 months ago by parism930
3
gravatar for darnells
14 months ago by
darnells30
darnells30 wrote:

UniProt has specific guidelines for whether a CDS is considered to be a protein: http://www.uniprot.org/help/cds_protein_definition

The sequence you listed from NCBI is a hypothetical protein; the nr data set is littered with these. This particular sequence must not have passed the experimental or sequence analysis criteria warranted for assignment of a Trembl entry. SWISSPROT entries are manually curated and are experimentally well characterized.

ADD COMMENTlink written 14 months ago by darnells30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1397 users visited in the last hour