How To Sort A Ncbiwww.Qblast Output By Score?
2
0
Entering edit mode
11.4 years ago
david ▴ 10

I try to write a Biopython script that performs a blast search and returns the id of the first hit. Although my python skills are rather rudimentary, this seems to work. But for my specific purpose it would be better if the script returns the first hit according to Blast score, not E value. Is it possible to sort the blast output by score first? Cheers david

biopython • 5.4k views
ADD COMMENT
2
Entering edit mode

BLAST output lists are sorted by decreasing scores. E-values are calculated directly from normalized scores.

ADD REPLY
3
Entering edit mode
11.4 years ago
Peter 6.0k

In your code snippet, record.alignments is just a list so you could sort it, for example using as the sort key the alignment with the highest HSP bitscore. Try:

from Bio.Blast import NCBIXML
handle = ...
for record in NCBIXML.parse(handle):
    if record.alignments:
        #Resort using bit score rather than e-value
        record.alignments.sort(key = lambda align: -max(hsp.score for hsp in align.hsps))
        print record.alignments[0].hit_id

Perhaps too much magic? I'm sorting using the negative of the max bitscore as a shorthand for using the sort method's reverse=True option. i.e.

from Bio.Blast import NCBIXML
handle = ...
for record in NCBIXML.parse(handle):
    if record.alignments:
        #Resort using bit score rather than e-value
        record.alignments.sort(key = lambda align: max(hsp.score for hsp in align.hsps), reverse=True)
        print record.alignments[0].hit_id

The other tricks here are using a lambda anonymous (unnamed) function, and a simple generator expression to find the best HSP bitscore.

Note that this may not do what you want if there is a good hit but split into two HSPs.

ADD COMMENT
0
Entering edit mode

Thanks a lot Peter! The magic completely solved it :). maybe you can help me with an even easier problem. I now get the best hit ID as a string looking like this: gi|3318718|pdb|1AMU|A. I want to use it to fetch the corresponding sequence with Entrez.efetch. In order to do this I would need to get the number following "gi". How do you do that? Maybe one could separate the string however I'm not really understanding which ID a blast search returns, or in other words if the gi ID is always the first one. Thanks

ADD REPLY
0
Entering edit mode
11.4 years ago
Lukas Pravda ▴ 80

What is the output you are getting? I assume it is a *.blast file, html or something like that. I'm afraid that you will need to parse the outputs into to objects and then sort them according to the score. This is rather IT approach. I'd bet there are plenty of blast output parsers, otherwise you will need to write one on your own.

Or you might find the answer in the following webpage http://seqanswers.com/forums/showthread.php?t=6869

ADD COMMENT
0
Entering edit mode

the blast output is a xml file. i then use the ncbixml parser. i need the id of the best hit. however, this best hit is always according to e value, not to score

result_handle = NCBIWWW.qblast("blastp","nr", query_str)
blast_records = NCBIXML.parse(result_handle)

for record in blast_records:
    id_query = record.alignments[0].hit_id
ADD REPLY

Login before adding your answer.

Traffic: 2415 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6