I have some questions about blast-2.10.1+. I have made a db as follows:
"path/to/NCBI/blast-2.10.1+/bin/makeblastdb.exe" -dbtype "prot" -in "path/to/fasta_db.txt" -parse_seqids -hash_index
I have then used this in a blast+ task run using python's subprocess module - the equivalent command line entry is:
blastp -out results\out\path -query fasta\query.txt -db db_name -evalue eval -outfmt 5 -num_threads thread_num -parse_deflines -seg 'yes' -soft_masking 'true' -use_sw_tback -num_alignments '200'
When I look at the xml results, I get inconsistent hit id's (e.g.
ref|expected_accession|). Can you tell me why this is happening, how to get the accession as the hit id with no prefixes, and, if that is not possible, whether I can assume the hit_id will always be contained within two '|' characters?
As you can see for the HSP below (sorry the formatting wont work properly), I also seem to be getting HSPs of length 0, which I assume is an issue relating to my blast command - however the other metrics (score etc) are there. Would you be able to tell me why this is based on the commands I have provided? I think it is an issue specific to this database (I have performed the same blast call with a different db).
<Hit> <Hit_num>6</Hit_num> <Hit_id>ref|YP_003273669.1|</Hit_id> <Hit_def>Unknown</Hit_def> <Hit_accession>Unknown</Hit_accession> <Hit_len>0</Hit_len> <Hit_hsps> <Hsp> <Hsp_num>1</Hsp_num> <Hsp_bit-score>65.855</Hsp_bit-score> <Hsp_score>159</Hsp_score> <Hsp_evalue>5.2884e-10</Hsp_evalue> <Hsp_query-from>0</Hsp_query-from> <Hsp_query-to>0</Hsp_query-to> <Hsp_hit-from>0</Hsp_hit-from> <Hsp_hit-to>0</Hsp_hit-to> <Hsp_identity>63</Hsp_identity> <Hsp_qseq></Hsp_qseq> <Hsp_hseq></Hsp_hseq> </Hsp> </Hit_hsps> </Hit>
EDIT I noticed that there was an extra file when I made the DB (.pdb-lock), compared to when I had done this previously with success. I tried remaking the database as described above after deleteing all of the old files, and the same thing happened. I then tried making the database without -parse_seqids, the lock file wasn't generated, and I got normal HSPs with a len >0. All of this is leading me to suspect its an issue with how my database is being made, but I've made other dbs with -parse_seqids with no issues before...