Parsing psiblast xml with BioPython
0
0
Entering edit mode
3.6 years ago
effidotpy ▴ 20

Hi there,

I'm trying to use BioPython to parse the xml output generated by psiblast (local execution, -outfmt 5). I aim to identify each of the 'iteration round', which should be one of the attributes of the Bio.Blast.Record.PSIBlast class, so I can extract the information for the last iteration only. However, I'm failing at extracting this 'round' information. Moreover, I think the parser is not guessing the class correctly since Bio.Blast.Record.PSIBlast and Bio.Blast.Record.Blast are two different classes:

from Bio.Blast import NCBIXML

# read the xml
handle = open("test.xml")

# parse the file to obtain 'records'
records = NCBIXML.parse(handle)

# print the records to see what's going on
for record in records:
   print(record)

>>>    <Bio.Blast.Record.Blast object at 0x7fce35a6f6d0>
>>>    <Bio.Blast.Record.Blast object at 0x7fce359ba650>
>>>    <Bio.Blast.Record.Blast object at 0x7fce358c1610>

I have also tried the xml2 output from psiblast (-outfmt 16) with the same result.

According to the class diagrams shown in documentation, I should be able to retrieve the round information if the object is of the Bio.Blast.Record.PSIBlast class. On the other hand, I'm able to retrieve the 'alignments' information:

for record in records:
    print(record.rounds)

>>> AttributeError: 'Blast' object has no attribute 'rounds'

for record in records:
    print(record.alignments)

>>> <Bio.Blast.Record.Alignment object at 0x7fce35fee390>, <Bio.Blast.Record.Alignment object at 0x7 ...

I'm sure I must be missing something. 'rounds' is supposed to be one of the attributes of the Bio.Blast.Record.PSIBlast class, so I don't know why I can not retrieve this information.

Any advice is welcome.

biopython psiblast parse • 1.1k views
ADD COMMENT
0
Entering edit mode

can you please post a snapshot of the XML ?

ADD REPLY
0
Entering edit mode

Thank you for your response Pierre. Sure, you can look at or download it here.

ADD REPLY
0
Entering edit mode

there is no "round" in the xml.

grep round tmp.xml
ADD REPLY
0
Entering edit mode

but there is Iteration_iter-num

ADD REPLY
0
Entering edit mode

You are right, there is no 'round' in the xml, but there is not either an 'alignments' node and yet I can get it from the record. I assume the NCBIXML parser reads the xml and assigns the xml nodes to objects of the Bio.Blast.Record.Blast or Bio.Blast.Record.PSIBlast class. These classes and their attributes are depicted here.

ADD REPLY

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6