Strange issue retrieving sequences from a local BLAST database made using a Uniprot proteome
1
0
Entering edit mode
7 months ago

I have installed BLAST on a machine running Ubuntu with a simple

sudo apt install ncbi-blast+

Next, I created a BLAST database using a reference proteome I downloaded from Uniprot using a command like this:

makeblastdb -dbtype prot -in <FASTA> -out <NAME> -taxid <####> -parse_seqids

Later, I compiled a list of Uniprot accession numbers I wanted to extract into a FASTA file and tried to do so with a command like this:

blastdbcmd -db <NAME> -entry_batch <List of Accessions>

but got a bunch of error messages that looked like this:

Error: [blastdbcmd] Skipped <Accession X>

I can query any of the skipped accession numbers individually like this:

blastdbcmd -db <NAME> -entry <Accession X>

After trying to figure this out for a while, I realized that I could actually make batch queries if my accessions were formatted like this sp|M7U9B9|ATG7_BOTF1 instead of simply M7U9B9. I have seen this issue recur in some but not all reference proteomes I downloaded from Uniprot. I know I can just upload my list of Accession to the Uniprot website and get my FASTA file that way but I would like to understand how and why this issue is occurring. If anyone can shed light on that or offer solutions (other than submitting my list to the Uniprot website) that would be much appreciated.

blast protein sequence fasta uniprot • 749 views
ADD COMMENT
1
Entering edit mode

NCBI recommends that you use prefix lcl|Accession_X for fasta headers, when you create local databases. If you do that then your queries should work.

ADD REPLY
0
Entering edit mode
7 months ago
GenoMax 118k

Following works with blast v.2.12.0. Truncated for space.

$ grep ">" new.fa
>lcl|P0DOD0
>lcl|Q07846
>lcl|Q07849
>lcl|Q07852
>lcl|Q07854
>lcl|Q07857
>lcl|Q07860
>lcl|Q07862

$ makeblastdb -dbtype prot -in new.fa -out temp -parse_seqids

$ more id
Q07854
Q07860

$ blastdbcmd -db temp -entry_batch id
>Q07854 
MADGRPATLDDFCRRFDISFFDLRLTCIFCSHTVDLADLALFYLKKLSLVFRGNCYYACCSECLRLSALFEQENYFQCSI
KAVHLEEIAQKKIKEICIRCICCLRLLDIVEKLDLLYSDETCYLIRGLWRGYCRNCIRKQ
>Q07860 
MSSWLSTTGKVYLPPAQPVARVLETDEYITGTSLYFHAGTERLLTVGHPYFPVKDVQEPHKVLVPKVSGSQFRVFRFNLP
DPNRFALIDNGFYDSDHERLVWKLRGIEIGRGGPLGIGTTGHPLYNKFGDTENPNGYKKQSDDNRQDVSLDPKQTQMFII
ADD COMMENT

Login before adding your answer.

Traffic: 1148 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6