Question: how to parse blast output using biopython
1
gravatar for camilla.borgonuovo
3.6 years ago by
European Union
camilla.borgonuovo10 wrote:

Dear all,

I am new to Biopython and have tried to parse my output blast file.

My script is:

from Bio.Blast import NCBIXML
result=open("blast_anoC.xml","r")
records= NCBIXML.parse(result)
item=next(records)
for alignment in item.alignments:
          for hsp in alignment.hsps:
                 if hsp.expect <0.01:
                         print('****Alignment****')
                         print('sequence:', alignment.title) 
                         print('length:', alignment.length)
                         print('score:', alignment.score)
                         print('gaps:', alignment.gaps)
                         print('e value:', hsp.expect)
                         print(hsp.query[0:90] + '...')
                         print(hsp.match[0:90] + '...')
                         print(hsp.sbjct[0:90] + '...')

When I run it, Biopython executes it but doesn't print nothing...Why?

Many thanks,

Camilla

 

blast biopython • 8.4k views
ADD COMMENTlink modified 9 months ago by krish.krishnan670 • written 3.6 years ago by camilla.borgonuovo10

Is there a reason you're using XML output? A tab-delimited output (-outfmt 6), would be much easier to parse with standard python, or Linux commands. You can also run BLAST from inside your python script with BioPython.

ADD REPLYlink written 9 months ago by st.ph.n1.9k
1

Biopython doesn't totally support the plain text parsing now, not enough stable, so XML is the best recommanded output for Biopython parsing.

ADD REPLYlink written 9 months ago by Rob90
4
gravatar for umer.zeeshan.ijaz
3.6 years ago by
Glasgow, UK
umer.zeeshan.ijaz1.7k wrote:

Hi Camilla,

There are two BUGS in your code. score and gaps are not member of alignment object. Correct code should be:

print('score:', hsp.score)
print('gaps:', hsp.gaps)

Use dir() to check members:

>>> dir(item.alignments[0])
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'accession', 'hit_def', 'hit_id', 'hsps', 'length', 'title']
>>> dir(item.alignments[0].hsps[0])
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'align_length', 'bits', 'expect', 'frame', 'gaps', 'identities', 'match', 'num_alignments', 'positives', 'query', 'query_end', 'query_start', 'sbjct', 'sbjct_end', 'sbjct_start', 'score', 'strand']

And here is how your code would run if you fix the bugs (I am using it on my BLAST file):

>>> for alignment in item.alignments:
...     for hsp in alignment.hsps:
...             if hsp.expect < 0.01:
...                     print('****Alignment****')
...                     print('sequence:', alignment.title)
...                     print('length:', alignment.length)
...                     print('score:', hsp.score)
...                     print('gaps:', hsp.gaps)
...                     print(hsp.query[0:90] + '...')
...                     print(hsp.match[0:90] + '...')
...                     print(hsp.sbjct[0:90] + '...')
...
****Alignment****
('sequence:', u'gi|568824607|ref|XM_006466626.1| PREDICTED: Citrus sinensis cold-regulated 413 plasma membrane protein 2-like (LOC102620025), transcript variant X5, mRNA')
('length:', 878)
('score:', 412.0)
('gaps:', 11)
AAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGATCAATTGGCCGTGGCTAATATGATCGATTCCGATATCAATGAGCTTAAAA...
|||||||||||    |||| || ||||| |||||||||||| |   ||  |  ||  |  |||||   || ||| ||||||||||||| |...
AAATGGGGAGAT---TGAATTATTTGGCTATGAAAACTGATGATCAGGTTGCAGCAGAGTTGATCAGCTCTGATTTCAATGAGCTTAAGA...
****Alignment****
('sequence:', u'gi|568824605|ref|XM_006466625.1| PREDICTED: Citrus sinensis cold-regulated 413 plasma membrane protein 2-like (LOC102620025), transcript variant X4, mRNA')
('length:', 911)
('score:', 412.0)
('gaps:', 11)
AAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGATCAATTGGCCGTGGCTAATATGATCGATTCCGATATCAATGAGCTTAAAA...
|||||||||||    |||| || ||||| |||||||||||| |   ||  |  ||  |  |||||   || ||| ||||||||||||| |...
AAATGGGGAGAT---TGAATTATTTGGCTATGAAAACTGATGATCAGGTTGCAGCAGAGTTGATCAGCTCTGATTTCAATGAGCTTAAGA...
****Alignment****
('sequence:', u'gi|568824603|ref|XM_006466624.1| PREDICTED: Citrus sinensis cold-regulated 413 plasma membrane protein 2-like (LOC102620025), transcript variant X3, mRNA')
('length:', 894)
('score:', 412.0)
('gaps:', 11)
AAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGATCAATTGGCCGTGGCTAATATGATCGATTCCGATATCAATGAGCTTAAAA...
|||||||||||    |||| || ||||| |||||||||||| |   ||  |  ||  |  |||||   || ||| ||||||||||||| |...
AAATGGGGAGAT---TGAATTATTTGGCTATGAAAACTGATGATCAGGTTGCAGCAGAGTTGATCAGCTCTGATTTCAATGAGCTTAAGA...
****Alignment****
('sequence:', u'gi|568824601|ref|XM_006466623.1| PREDICTED: Citrus sinensis cold-regulated 413 plasma membrane protein 2-like (LOC102620025), transcript variant X2, mRNA')
('length:', 922)
('score:', 412.0)
('gaps:', 11)
AAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGATCAATTGGCCGTGGCTAATATGATCGATTCCGATATCAATGAGCTTAAAA...
|||||||||||    |||| || ||||| |||||||||||| |   ||  |  ||  |  |||||   || ||| ||||||||||||| |...
AAATGGGGAGAT---TGAATTATTTGGCTATGAAAACTGATGATCAGGTTGCAGCAGAGTTGATCAGCTCTGATTTCAATGAGCTTAAGA...


Best Wishes,

Umer

 

 

 

ADD COMMENTlink written 3.6 years ago by umer.zeeshan.ijaz1.7k
1
gravatar for Peter
3.6 years ago by
Peter5.6k
Scotland, UK
Peter5.6k wrote:

Your code only looks at the first BLAST query, and presumably that gave no matches (under the e-value threshold). You could replace ``item=next(records)`` with ``for item in records:`` (and indent the rest of the code) to try looking at all the records.

ADD COMMENTlink written 3.6 years ago by Peter5.6k
0
gravatar for mccannj0908
3.6 years ago by
Austria
mccannj09080 wrote:

Maybe the threshold value in the if statement is too small.

ADD COMMENTlink written 3.6 years ago by mccannj09080
0
gravatar for camilla.borgonuovo
3.6 years ago by
European Union
camilla.borgonuovo10 wrote:

I also tried with "hsp.expect <100"

...but is the same

:(

ADD COMMENTlink written 3.6 years ago by camilla.borgonuovo10

did the blast return any significant hits?

ADD REPLYlink written 3.6 years ago by mccannj09080
0
gravatar for krish.krishnan67
9 months ago by
krish.krishnan670 wrote:

I used the same script above to parse my blast output in XML. The error in the script was the line item=next(records). Instead I used item=records.next() and I got the output on screen for the same e-value without any compromise. Maybe you should try this!

ADD COMMENTlink written 9 months ago by krish.krishnan670

I think you are 2 years late.. And everything about biopython is writen here

ADD REPLYlink modified 9 months ago • written 9 months ago by Rob90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1016 users visited in the last hour