I am blasting a local fasta file of protein sequences against a local protein database. The Blast works fine and I am able to generate an XML file as my output. I would like to extract the query and top hit sequences from this XML file. However, my HSPs (high-scoring pair objects) in the XML file have missing information.
Here is the code that generates the XML file:
blast_handle = NcbiblastpCommandline(query='/PATH/to/proteins_of_interest.fasta', db='/PATH/to/local/db/protein.fa', outfmt=5, out='/PATH/to/results.xml') stdout, stderr = blast_handle() result_handle = open('/PATH/to/results.xml') blast_records = NCBIXML.parse(result_handle)
I successfully get an XML file, and the following loop gives me the name of the top hit of each Blast (there were four protein sequences in my query fasta file):
for blast_record in blast_records: print(blast_record.alignments.hit_id)
However, my XML file is missing HSP data. This is a portion of the XML file:
<Hit> <Hit_num>9</Hit_num> <Hit_id>lcl|name_of_my_hit</Hit_id> <Hit_def>Unknown</Hit_def> <Hit_accession>Unknown</Hit_accession> <Hit_len>0</Hit_len> <Hit_hsps> <Hsp> <Hsp_num>1</Hsp_num> <Hsp_bit-score>105.145</Hsp_bit-score> <Hsp_score>261</Hsp_score> <Hsp_evalue>9.96617e-26</Hsp_evalue> <Hsp_query-from>0</Hsp_query-from> <Hsp_query-to>0</Hsp_query-to> <Hsp_hit-from>0</Hsp_hit-from> <Hsp_hit-to>0</Hsp_hit-to> <Hsp_identity>60</Hsp_identity> <Hsp_qseq></Hsp_qseq> <Hsp_hseq></Hsp_hseq> </Hsp> </Hit_hsps> </Hit>
As you can see, there is nothing between the Hsp_qseq and Hsp_hseq start and end tags. Does anyone know why the amino acid sequences aren't here? Thanks!