Problem Parsing Xml In Biopython
2
1
Entering edit mode
12.1 years ago
Amno ▴ 10

Hi mates, I m new in python and I m trying to parse a result from local Blast...here the code.

from Bio.Blast.Applications import NcbiblastpCommandline
from Bio.Blast import NCBIStandalone
from Bio.Blast import NCBIXML

name = "virusprotein.fa"

data_base = "virusprot.fa"
out_file = "blast_test.xml"
blastp_cline = NcbiblastpCommandline(query=name, db=data_base, evalue=0.001, out=out_file)
print "hi" # code works until here to parse results in xml

result_handle = open("blast_test.xml")
blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()

#for alignment in blast_record.alignments:
    #for hsp in alignment.hsps:
         #if hsp.expect < evalue:
             #print 'Sequence:', alignment.title
            #print 'Length:', alignment.length
            #print 'E value:', hsp.expect
            #print hsp.query[0:50] + '...'
            #print hsp.match[0:50] + '...'
            #print hsp.sbjct[0:50] + '...'

As you can see in commented code, its a module for parsing and a class to print a summary, actually this work when I do Blast on Internet, but not when I do it locally. The code ony works until (print "hi"). When I try to execute the code bellow it says :

Traceback (most recent call last):
  File "blast_all.py", line 14, in <module>
    blast_record = blast_records.next()
  File "/usr/local/lib/python2.7/dist-packages/biopython-1.58-py2.7-linux-i686.egg/Bio/Blast/NCBIXML.py", line 624, in parse
    % (XML_START, repr(text[:20])))
ValueError: Your XML file did not start with <?xml... but instead 'BLASTP 2.2.25+\n\n\nRef

This may by easy, but I was and still whole the day with it, please any suggestion is welcome. thanks in advance

python biopython blast parsing • 8.2k views
ADD COMMENT
6
Entering edit mode
12.1 years ago

The error message from the XML parser indicates that 'blast_test.xml' is not actually an XML file. If you look at the file, you should see Blast 'human readable' output. To have blastp generate XML output, you want to set '-outfmt':

blastp_cline = NcbiblastpCommandline(query=name, db=data_base, evalue=0.001,
                                     outfmt=5, out=out_file)

The Biopython Tutorial has a full example in section 7.2.3:

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc86

ADD COMMENT
0
Entering edit mode

Thank you man, you great.

ADD REPLY
1
Entering edit mode
12.1 years ago
Chris Maloney ▴ 360

First rule in debugging code: "Read the error message!".

ADD COMMENT

Login before adding your answer.

Traffic: 1818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6