Question: How To Sort A Ncbiwww.Qblast Output By Score?
0
gravatar for david
5.6 years ago by
david10
david10 wrote:

I try to write a Biopython script that performs a blast search and returns the id of the first hit. Although my python skills are rather rudimentary, this seems to work. But for my specific purpose it would be better if the script returns the first hit according to Blast score, not E value. Is it possible to sort the blast output by score first? Cheers david

biopython • 2.8k views
ADD COMMENTlink modified 5.5 years ago by Peter5.6k • written 5.6 years ago by david10
2

BLAST output lists are sorted by decreasing scores. E-values are calculated directly from normalized scores.

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by a.zielezinski8.3k
3
gravatar for Peter
5.5 years ago by
Peter5.6k
Scotland, UK
Peter5.6k wrote:

In your code snippet, record.alignments is just a list so you could sort it, for example using as the sort key the alignment with the highest HSP bitscore. Try:

from Bio.Blast import NCBIXML
handle = ...
for record in NCBIXML.parse(handle):
    if record.alignments:
        #Resort using bit score rather than e-value
        record.alignments.sort(key = lambda align: -max(hsp.score for hsp in align.hsps))
        print record.alignments[0].hit_id

Perhaps too much magic? I'm sorting using the negative of the max bitscore as a shorthand for using the sort method's reverse=True option. i.e.

from Bio.Blast import NCBIXML
handle = ...
for record in NCBIXML.parse(handle):
    if record.alignments:
        #Resort using bit score rather than e-value
        record.alignments.sort(key = lambda align: max(hsp.score for hsp in align.hsps), reverse=True)
        print record.alignments[0].hit_id

The other tricks here are using a lambda anonymous (unnamed) function, and a simple generator expression to find the best HSP bitscore.

Note that this may not do what you want if there is a good hit but split into two HSPs.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Peter5.6k

Thanks a lot Peter! The magic completely solved it :). maybe you can help me with an even easier problem. I now get the best hit ID as a string looking like this: gi|3318718|pdb|1AMU|A. I want to use it to fetch the corresponding sequence with Entrez.efetch. In order to do this I would need to get the number following "gi". How do you do that? Maybe one could separate the string however I'm not really understanding which ID a blast search returns, or in other words if the gi ID is always the first one. Thanks

ADD REPLYlink written 5.5 years ago by david10
0
gravatar for Lukas Pravda
5.6 years ago by
Lukas Pravda80
Brno
Lukas Pravda80 wrote:

What is the output you are getting? I assume it is a *.blast file, html or something like that. I'm afraid that you will need to parse the outputs into to objects and then sort them according to the score. This is rather IT approach. I'd bet there are plenty of blast output parsers, otherwise you will need to write one on your own.

Or you might find the answer in the following webpage http://seqanswers.com/forums/showthread.php?t=6869

ADD COMMENTlink written 5.6 years ago by Lukas Pravda80

the blast output is a xml file. i then use the ncbixml parser. i need the id of the best hit. however, this best hit is always according to e value, not to score

result_handle = NCBIWWW.qblast("blastp","nr", query_str)
blast_records = NCBIXML.parse(result_handle)

for record in blast_records:
    id_query = record.alignments[0].hit_id
ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by david10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1589 users visited in the last hour