Parsing psiblast xml with BioPython
0
0
Entering edit mode
14 months ago
effidotpy ▴ 20

Hi there,

I'm trying to use BioPython to parse the xml output generated by psiblast (local execution, -outfmt 5). I aim to identify each of the 'iteration round', which should be one of the attributes of the Bio.Blast.Record.PSIBlast class, so I can extract the information for the last iteration only. However, I'm failing at extracting this 'round' information. Moreover, I think the parser is not guessing the class correctly since Bio.Blast.Record.PSIBlast and Bio.Blast.Record.Blast are two different classes:

from Bio.Blast import NCBIXML

handle = open("test.xml")

# parse the file to obtain 'records'
records = NCBIXML.parse(handle)

# print the records to see what's going on
for record in records:
print(record)

>>>    <Bio.Blast.Record.Blast object at 0x7fce35a6f6d0>
>>>    <Bio.Blast.Record.Blast object at 0x7fce359ba650>
>>>    <Bio.Blast.Record.Blast object at 0x7fce358c1610>


I have also tried the xml2 output from psiblast (-outfmt 16) with the same result.

According to the class diagrams shown in documentation, I should be able to retrieve the round information if the object is of the Bio.Blast.Record.PSIBlast class. On the other hand, I'm able to retrieve the 'alignments' information:

for record in records:
print(record.rounds)

>>> AttributeError: 'Blast' object has no attribute 'rounds'

for record in records:
print(record.alignments)

>>> <Bio.Blast.Record.Alignment object at 0x7fce35fee390>, <Bio.Blast.Record.Alignment object at 0x7 ...


I'm sure I must be missing something. 'rounds' is supposed to be one of the attributes of the Bio.Blast.Record.PSIBlast class, so I don't know why I can not retrieve this information.

biopython psiblast parse • 432 views
0
Entering edit mode

can you please post a snapshot of the XML ?

0
Entering edit mode

0
Entering edit mode

there is no "round" in the xml.

grep round tmp.xml

0
Entering edit mode

but there is Iteration_iter-num

0
Entering edit mode

You are right, there is no 'round' in the xml, but there is not either an 'alignments' node and yet I can get it from the record. I assume the NCBIXML parser reads the xml and assigns the xml nodes to objects of the Bio.Blast.Record.Blast or Bio.Blast.Record.PSIBlast class. These classes and their attributes are depicted here.