Question: Parsing psiblast xml with BioPython
0
gravatar for effidotpy
4 days ago by
effidotpy10
effidotpy10 wrote:

Hi there,

I'm trying to use BioPython to parse the xml output generated by psiblast (local execution, -outfmt 5). I aim to identify each of the 'iteration round', which should be one of the attributes of the Bio.Blast.Record.PSIBlast class, so I can extract the information for the last iteration only. However, I'm failing at extracting this 'round' information. Moreover, I think the parser is not guessing the class correctly since Bio.Blast.Record.PSIBlast and Bio.Blast.Record.Blast are two different classes:

from Bio.Blast import NCBIXML

# read the xml
handle = open("test.xml")

# parse the file to obtain 'records'
records = NCBIXML.parse(handle)

# print the records to see what's going on
for record in records:
   print(record)

>>>    <Bio.Blast.Record.Blast object at 0x7fce35a6f6d0>
>>>    <Bio.Blast.Record.Blast object at 0x7fce359ba650>
>>>    <Bio.Blast.Record.Blast object at 0x7fce358c1610>

I have also tried the xml2 output from psiblast (-outfmt 16) with the same result.

According to the class diagrams shown in documentation, I should be able to retrieve the round information if the object is of the Bio.Blast.Record.PSIBlast class. On the other hand, I'm able to retrieve the 'alignments' information:

for record in records:
    print(record.rounds)

>>> AttributeError: 'Blast' object has no attribute 'rounds'

for record in records:
    print(record.alignments)

>>> <Bio.Blast.Record.Alignment object at 0x7fce35fee390>, <Bio.Blast.Record.Alignment object at 0x7 ...

I'm sure I must be missing something. 'rounds' is supposed to be one of the attributes of the Bio.Blast.Record.PSIBlast class, so I don't know why I can not retrieve this information.

Any advice is welcome.

psiblast biopython parse • 57 views
ADD COMMENTlink written 4 days ago by effidotpy10

can you please post a snapshot of the XML ?

ADD REPLYlink written 4 days ago by Pierre Lindenbaum130k

Thank you for your response Pierre. Sure, you can look at or download it here.

ADD REPLYlink written 4 days ago by effidotpy10

there is no "round" in the xml.

grep round tmp.xml
ADD REPLYlink written 4 days ago by Pierre Lindenbaum130k

but there is Iteration_iter-num

ADD REPLYlink written 4 days ago by Pierre Lindenbaum130k

You are right, there is no 'round' in the xml, but there is not either an 'alignments' node and yet I can get it from the record. I assume the NCBIXML parser reads the xml and assigns the xml nodes to objects of the Bio.Blast.Record.Blast or Bio.Blast.Record.PSIBlast class. These classes and their attributes are depicted here.

ADD REPLYlink modified 4 days ago • written 4 days ago by effidotpy10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1956 users visited in the last hour