How To Sort A Ncbiwww.Qblast Output By Score?
2
0
Entering edit mode
9.1 years ago
david ▴ 10

I try to write a Biopython script that performs a blast search and returns the id of the first hit. Although my python skills are rather rudimentary, this seems to work. But for my specific purpose it would be better if the script returns the first hit according to Blast score, not E value. Is it possible to sort the blast output by score first? Cheers david

biopython • 4.3k views
2
Entering edit mode

BLAST output lists are sorted by decreasing scores. E-values are calculated directly from normalized scores.

3
Entering edit mode
9.1 years ago
Peter 6.0k

In your code snippet, record.alignments is just a list so you could sort it, for example using as the sort key the alignment with the highest HSP bitscore. Try:

from Bio.Blast import NCBIXML
handle = ...
for record in NCBIXML.parse(handle):
if record.alignments:
#Resort using bit score rather than e-value
record.alignments.sort(key = lambda align: -max(hsp.score for hsp in align.hsps))
print record.alignments[0].hit_id


Perhaps too much magic? I'm sorting using the negative of the max bitscore as a shorthand for using the sort method's reverse=True option. i.e.

from Bio.Blast import NCBIXML
handle = ...
for record in NCBIXML.parse(handle):
if record.alignments:
#Resort using bit score rather than e-value
record.alignments.sort(key = lambda align: max(hsp.score for hsp in align.hsps), reverse=True)
print record.alignments[0].hit_id


The other tricks here are using a lambda anonymous (unnamed) function, and a simple generator expression to find the best HSP bitscore.

Note that this may not do what you want if there is a good hit but split into two HSPs.

0
Entering edit mode

Thanks a lot Peter! The magic completely solved it :). maybe you can help me with an even easier problem. I now get the best hit ID as a string looking like this: gi|3318718|pdb|1AMU|A. I want to use it to fetch the corresponding sequence with Entrez.efetch. In order to do this I would need to get the number following "gi". How do you do that? Maybe one could separate the string however I'm not really understanding which ID a blast search returns, or in other words if the gi ID is always the first one. Thanks

0
Entering edit mode
9.1 years ago
Lukas Pravda ▴ 80

What is the output you are getting? I assume it is a *.blast file, html or something like that. I'm afraid that you will need to parse the outputs into to objects and then sort them according to the score. This is rather IT approach. I'd bet there are plenty of blast output parsers, otherwise you will need to write one on your own.

0
Entering edit mode

the blast output is a xml file. i then use the ncbixml parser. i need the id of the best hit. however, this best hit is always according to e value, not to score

result_handle = NCBIWWW.qblast("blastp","nr", query_str)
blast_records = NCBIXML.parse(result_handle)

for record in blast_records:
id_query = record.alignments[0].hit_id