HSP of length 0
0
0
Entering edit mode
14 months ago

Hello,

I have some questions about blast-2.10.1+. I have made a db as follows:

"path/to/NCBI/blast-2.10.1+/bin/makeblastdb.exe" -dbtype "prot" -in "path/to/fasta_db.txt" -parse_seqids -hash_index

I have then used this in a blast+ task run using python's subprocess module - the equivalent command line entry is:

blastp -out results\out\path -query fasta\query.txt -db db_name -evalue eval -outfmt 5 -num_threads thread_num -parse_deflines -seg 'yes' -soft_masking 'true'  -use_sw_tback -num_alignments '200'

When I look at the xml results, I get inconsistent hit id's (e.g. dbj|expected_accession|, gb|expected_accession|, ref|expected_accession|). Can you tell me why this is happening, how to get the accession as the hit id with no prefixes, and, if that is not possible, whether I can assume the hit_id will always be contained within two '|' characters?

As you can see for the HSP below (sorry the formatting wont work properly), I also seem to be getting HSPs of length 0, which I assume is an issue relating to my blast command - however the other metrics (score etc) are there. Would you be able to tell me why this is based on the commands I have provided? I think it is an issue specific to this database (I have performed the same blast call with a different db).

<Hit>
  <Hit_num>6</Hit_num>
  <Hit_id>ref|YP_003273669.1|</Hit_id>
  <Hit_def>Unknown</Hit_def>
  <Hit_accession>Unknown</Hit_accession>
  <Hit_len>0</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>65.855</Hsp_bit-score>
      <Hsp_score>159</Hsp_score>
      <Hsp_evalue>5.2884e-10</Hsp_evalue>
      <Hsp_query-from>0</Hsp_query-from>
      <Hsp_query-to>0</Hsp_query-to>
      <Hsp_hit-from>0</Hsp_hit-from>
      <Hsp_hit-to>0</Hsp_hit-to>
      <Hsp_identity>63</Hsp_identity>
      <Hsp_qseq></Hsp_qseq>
      <Hsp_hseq></Hsp_hseq>
    </Hsp>
  </Hit_hsps>
</Hit>

Thanks, Tim

EDIT I noticed that there was an extra file when I made the DB (.pdb-lock), compared to when I had done this previously with success. I tried remaking the database as described above after deleteing all of the old files, and the same thing happened. I then tried making the database without -parse_seqids, the lock file wasn't generated, and I got normal HSPs with a len >0. All of this is leading me to suspect its an issue with how my database is being made, but I've made other dbs with -parse_seqids with no issues before...

alignment blastp • 533 views
ADD COMMENT
0
Entering edit mode
<Hit_id>ref|YP_003273669.1|</Hit_id>

This may have something to do with it:

record

ADD REPLY
0
Entering edit mode

Hi Genomax, I made a webscraper to get the sequences, although they were removed even when I made the webscraper. The sequences were then made into a local database, so i dont think it should be an issue (it might not be in ncbi's databases but it is in mine). I've also checked all the formating of deflines etc, so it's a bit of a mystery!

From the fasta I used to make the db:

>YP_003273669.1
MVRAVGPADRDLDRTDTAVGRTMRCYAVFAGAAYLVFLPFLIPAITQAHAVVATWWTPSAMILAVLPGFLVAGHAIGSSDLLRLRVLSSIAALGVLLAVALWPMAWNGNPIDDATVFWLGPFVVLGTICALVAWPTAVTATYVVIAAVGVKFVGNIAQSNAGVTAMIADIACSTVFALVPFALGTAGLRTAAALDRAAASAIAQAQESGAARARVMERAQARALTHDMIMANLLEAARGGASARLPDRAAATVRALDKTLRSAAVEREPDGDEVTRTIVDVIEENDPQMSISMAGRDDEVHVYPPAVVDAVLGAVAEAARNCARHLPADVVRRCRIDLRAGRISVRIDDDGAGFVRSSVAPERLGISSSIEKRMSEVAGGTATVWSRPGAGTSVQVRWDPPGAIAIADGSDALGMRSRTARWLAGVIVGLEVLAAVLNPGELRSPWVALLALLLMACAAAVLVAPGPDPLAGWGTLFVALTPAVMCGIYLFALDPPGQLVPIWITGPPALIVAMLAMRGRPSSAALGFVVTLAVAAWWSSVDAGDALRGVLLIGPSGGFVAIAIVYAAVNRALVGALADAQRQAADAAAEAAAIRAAADERALHLAPITALARPLLALLAEEGNAVDAQMRLECAVIEAIFGICCGRGPCAHPTLSMRSTRPVVVV
ADD REPLY
0
Entering edit mode

Since we can't see/access your data this may be difficult to diagnose. Someone else may have additional insight. Use blastdbcheck -db DB to verify that database looks good.

ADD REPLY

Login before adding your answer.

Traffic: 2437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6