I'm attempting to create a custom BLAST database from a dozen or so whole genomes. For downstream analyses it's necessary to have the taxon ID numbers included in the blast db. This seems like it should be simple enough using the
-taxid_map <taxmap.txt> commands in
makeblastdb, but alas, apparently not.
My fasta headers look like:
>HG380758.1 >HG380759.1 >HG380760.1 ...
and my taxid_map.txt file looks like:
HG380758.1 104782 HG380759.1 104782 HG380760.1 104782 ...
The command I've run is then:
makeblastdb -dbtype nucl -in in.fna -parse_seqids -taxid_map taxid_map.txt
Unfortunately, this returns the error:
Building a new DB, current time: 07/08/2016 11:53:59 New DB name: in.fna New DB title: in.fna Sequence type: Nucleotide Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 36167 sequences in 8.00621 seconds. Error: [makeblastdb] No sequences matched any of the taxids provided.
I've read questions How To Make A Blast Database With Taxonids From Ncbi Query. and Ncbi Blast+ Taxid And Taxid_Map (also http://www.verdantforce.com/2014/12/building-blast-databases-with-taxonomy.html), and basically can't see what I'm doing wrong. I have also tried reformatting the fasta headers to include the taxid, as in
>HG380758.1 taxon=104782, and including the
> seqid prefix in the taxid_map.txt file, but to no avail.
I'm using makeblastdb version 2.3.0+, and I note from previous similar queries the
-taxid_map parameter has not always been functional.
Is this a bug with this version of makeblastdb, or am I still doing something wrong? Any help / workarounds would be much appreciated!