Mapping between a NCBI protein and an UniProtKB protein from a FASTA file
1
0
Entering edit mode
6.6 years ago
parism9 ▴ 30

I have a FASTA file from using the BioPython BLASTp command programmatically. I want to map each of the entries to their associated UniProtKB (either Swiss or Trembl) (or even to a UniParc / Uniref if needed) entry. For an entry like the following, it seems there is no protein in UniProtKB. Is this true? Or am I missing a way to map between the two? I have used http://www.uniprot.org/uploadlists/ to try to match, but no luck.

>gi|1198293922|gb|OUM52707.1| hypothetical protein BVG19_g1918 [[Candida] boidinii] >gi|1205206951|gb|OWB53426.1| hypothetical protein B5S27_g5022 [[Candida] boidinii] >gi|1205223768|gb|OWB70243.1| hypothetical protein B5S30_g5729 [[Candida] boidinii]
MALAKAASINDDIHDLTMRAFRCYVLDLVEQYEGGHPGSAMGMVAMGIALWKYTMKYSPNDATWFNRDRFVLSNGHVCLFQYLFQHLSGLKSMTEKQLKSYHSSDYHSKCPGHPEIENEAVEVTTGPLGQGISNSVGLAIASKNLGALYNKPGYEVVNNTTYCIVGDACLQEGPALESISFAGHLGLDNLIVIYDNNQVCCDGSVDIANTEDISAKFRACNWNVIEVEDGARDVATIVKALELAGAEKNRPTLINVRTIIGTDSAFQNHCAAHGSALGEEGVRELKIKYGFNPSQKFHFPQEVYDFFADLPAKGDEYVSNWKKLVSSYVKEYPELGAEFQARVRGELPKNWKSLLPNELPSEDTATRTSARAMVRAFAKDVPNVIAGSADLSVSVNLPWPGSKYFENPQLATQCGLAGDYSGRYVEFGIREHCMCAIANGLAAFNKGTFIPITSSFYMFYLYAAPALRMAALQELKAIHIATHDSIGAGEDGPTHQPIAQSALWRAMPNFYYMRPGDASEVRGLFEKAVELPLSTLFSLSRHEVPQYPGKSSVELAKRGGYVFEDAKDADVQLIGAGSELEQTVKTARLLRSRGLKVRILSFPCQRLFDEQSVGYRRSVLQRGKVPTVVIEAYVAYGWERYATAGYNMNTFGKSLPVEDVYEYFGFNPSEISKKIEGYVRAVKSNPDLLYEFIDLKEKPKHDQNHL

The protein can be referenced here: https://www.ncbi.nlm.nih.gov/protein/OWB83737.1?feature=any

When I run a BLASTp search on the UniProtKB page, it only finds a protein with 97% identity, nothing matching this protein.

fasta blast sequence alignment • 1.5k views
ADD COMMENT
3
Entering edit mode
6.6 years ago
darnells ▴ 30

UniProt has specific guidelines for whether a CDS is considered to be a protein: http://www.uniprot.org/help/cds_protein_definition

The sequence you listed from NCBI is a hypothetical protein; the nr data set is littered with these. This particular sequence must not have passed the experimental or sequence analysis criteria warranted for assignment of a Trembl entry. SWISSPROT entries are manually curated and are experimentally well characterized.

ADD COMMENT

Login before adding your answer.

Traffic: 1593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6