I have installed BLAST on a machine running Ubuntu with a simple
sudo apt install ncbi-blast+
Next, I created a BLAST database using a reference proteome I downloaded from Uniprot using a command like this:
makeblastdb -dbtype prot -in <FASTA> -out <NAME> -taxid <####> -parse_seqids
Later, I compiled a list of Uniprot accession numbers I wanted to extract into a FASTA file and tried to do so with a command like this:
blastdbcmd -db <NAME> -entry_batch <List of Accessions>
but got a bunch of error messages that looked like this:
Error: [blastdbcmd] Skipped <Accession X>
I can query any of the skipped accession numbers individually like this:
blastdbcmd -db <NAME> -entry <Accession X>
After trying to figure this out for a while, I realized that I could actually make batch queries if my accessions were formatted like this sp|M7U9B9|ATG7_BOTF1
instead of simply M7U9B9
. I have seen this issue recur in some but not all reference proteomes I downloaded from Uniprot. I know I can just upload my list of Accession to the Uniprot website and get my FASTA file that way but I would like to understand how and why this issue is occurring. If anyone can shed light on that or offer solutions (other than submitting my list to the Uniprot website) that would be much appreciated.
NCBI recommends that you use prefix
lcl|Accession_X
for fasta headers, when you create local databases. If you do that then your queries should work.