Question: Extracting the <Hit_def> from Blast xml output using Biopython and saving in .csv
3.8 years ago by
Anushka20 wrote:

I have the blast output in .xml form and I want to retrieve few attributes like <hit_def>. I found the parser on biophython.

from Bio.Blast import NCBIXML
blast = NCBIXML.parse(open('output.xml', 'rU'))
for record in blast:
    for align in record.alignments:
        for hsp in align.hsps:
            print hsp.score, align.hit_def

Q: Above code is just printing the out put on the terminal. Could anyone help me how to store the output file in .csv format.

Specifically, I need output.csv with these attribute <Iteration_query-def>, <Hit_def>, <Hsp_score>, <Hsp_evalue> as columns, in a .csv format.

Q2: How can I to get the result just for the best hit of each query ? While running blastp setting -max_target_seqs to 1 will do the same?

Following is a segment of my input xml

          <Hit_def>low-density lipoprotein receptor-related protein 6 precursor [Homo sapiens] &gt;gi|578822872|ref|XP_006719141.1| PREDICTED: low-density lipoprotein receptor-related protein 6 isoform X1 [Homo sapiens]</Hit_def>
              <Hsp_midline>+N C   +  C H+CL R  G   C C  GF L+S  K C+   V + ++L +     R   L    +        V +  A+D D VTD+RIY   +  KT   A+ + SA E V  +G       D    +      K +YW   TG    + VS    +   V  + D    R + +D     +YW E+</Hsp_midline>
              <Hsp_midline>NEC  S   C H+CLA   GGFVC C   ++L +  +  S   T            +V D  Q     LPI  S RNV    AID D + D ++Y</Hsp_midline>

I would really appreciate your help.


using xsltproc rather than python would be straighforward.

ADD REPLYlink written 3.8 years ago by Pierre Lindenbaum112k
3.8 years ago by
Houston, TX
RamRS17k wrote:

You could redirect output to a CSV file using File IO. Open a file in write mode and modify the print so it writes into the file. Here's one of many resources:

Google away for more. This link should help you get the attributes you require.

Q2: Best hit is an ambiguous term. Each hit can have multiple HSPs and you'd need to average or sum across HSP scores to find the "best" alignment.


