how to get the strand from a blast xml file
1
0
Entering edit mode
6.5 years ago
bio90029 ▴ 10

Hi, I am working with a blast xml as the one below:

 <Hit_num>2</Hit_num>
  <Hit_id>gnl|BL_ORD_ID|79</Hit_id>
  <Hit_def>NODE_82_length_56137_cov_22.281775</Hit_def>
  <Hit_accession>79</Hit_accession>
  <Hit_len>56195</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>80.5295</Hsp_bit-score>
      <Hsp_score>46</Hsp_score>
      <Hsp_evalue>3.65762e-15</Hsp_evalue>
      <Hsp_query-from>274</Hsp_query-from>
      <Hsp_query-to>515</Hsp_query-to>
      <Hsp_hit-from>17046</Hsp_hit-from>
      <Hsp_hit-to>17290</Hsp_hit-to>
      <Hsp_query-frame>1</Hsp_query-frame>
      <Hsp_hit-frame>1</Hsp_hit-frame>
      <Hsp_identity>182</Hsp_identity>
      <Hsp_positive>182</Hsp_positive>
      <Hsp_gaps>13</Hsp_gaps>
      <Hsp_align-len>250</Hsp_align-len>
      <Hsp_qseq>ATGCGCGAAAAAGACGAGAAAATTCCGCTGCGTCACGGTGTGCTGGGTCAGGAAACCTGCGGCCCGGGTGGGATCTCTTACGGCATGCGCTCCAT-CGGCGGCGTACTGG-A-A-CTGGTGGATTATATGGAGCAGTACTC-CCCG--AACGCGTGGATGCTGAACTACTCCAACCCGGCGGCGATTGTGGCGGAAG-CGACGCGCCGCCTGCGTCCGCATGCCAAAATTCTCAACATCTGCGATATGCC</Hsp_qseq>
      <Hsp_hseq>ATGCGCGAGCAGGATGAAAAGATCCCGCTGCGCCACGGGGTGGTCGGCCAGGAAACCTGCGGACCCGGCGGGCTGGCATATGGCCTGCGTACTATCCTGC--CG-A-TGGTAGAGCTGATCGATCGGGTGGAACGCTACGCGCACGAAAAGGCGTGGATCGTGAACTACTCCAACCCGGCGGCGATTGTCGCTGAGGGCGT-GCGTCGCCTGCGCCCCAACGCACGCGTGCTGAACATCTGCGATATGCC</Hsp_hseq>
      <Hsp_midline>||||||||  | || || || || |||||||| ||||| ||| | || |||||||||||||| || || ||| |  | || ||| ||||  | || | ||  || | ||| | | ||| | |||    |||| |  ||| | | ||  || ||||||||  |||||||||||||||||||||||||||| || || | ||  ||| |||||||| ||  | ||     | || |||||||||||||||||</Hsp_midline>
    </Hsp>

I am using a script based on the one provide by biopython Cook's book (below) but this is what I get;

(None, None)

from Bio.Blast import NCBIXML
 blast_record = NCBIXML.read(result_handle)
for alignment in blast_record.alignments:
... for hsp in alignment.hsps:
        print hsp.strand

Can anyone help me to extract in which strand the subj.sequence is, please? I would like to do this so I can work with the sequence. What I would like to do is to get to the contigs and add the missing bases but if it is in the strand -1 I need to reversed_complement() the sequence. Thanks

python biopython • 2.1k views
ADD COMMENT
2
Entering edit mode
6.3 years ago
Tm ★ 1.1k

you may try greping <hsp_qseq> and <hsp_hseq> lines showing alignment of query with that of subject (irrepective of the strand in which it belong) and can replace the missing base in contigs.

ADD COMMENT
0
Entering edit mode

Thanks, Unluckily, I cannot used grep as I wanted it as part of a python programme I was working on. By now I had finished it. I worked out the strand by the start and end positions.

ADD REPLY

Login before adding your answer.

Traffic: 2169 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6