Question: Best Blast Hit Without Pain
4
gravatar for Kap
6.9 years ago by
Kap50
Kap50 wrote:

Hi,

This might be naive but here I go. I have a set of target sequences and a query sequence. I want to find the best BLAST hit (lets say using the bit score). The straightforward way to do this is to run BLAST against the targets and then parse the output file (looking at every hit one at a time) and then finding the best hit.

My question is, is there any other way to do this? For example, already getting sorted output (according to bit score) and then retrieve the first hit?

EDIT (added information):

To make it clear, I want to search 1 (or more) sequences with a custom database using local NCBI BLAST (nucleotide vs nucleotide). I donot want to use e-value as criterion as the length of alignment is important. Thats why I would like to use bit score. As per Pierre's link what I need is -v. Thanks again for comments.

Any suggestions are welcome.

best regards

ncbi blast • 13k views
ADD COMMENTlink modified 6.9 years ago by Andreas2.3k • written 6.9 years ago by Kap50
1

The only way to know which is the best alignment is to look at the alignments. If you bypass this step, you will get a lot of false positive results.

ADD REPLYlink written 6.9 years ago by Giovanni M Dall'Olio25k
5
gravatar for Andreas
6.9 years ago by
Andreas2.3k
Singapore
Andreas2.3k wrote:

I think one trick is to use tabular output format (-m 8) and then sort manually according to the field you are interested in.

You will have to tell "sort" which field to use (bit-score should be field 12, I think) and to sort numerically, supporting scientific notation (-g):

An example:

$ blastall -p blastn -i seq.fa -d db.fa -m 8 | sort -g -k 12

To get the best hit just append a tail -n1

Andreas

PS: Be warned: I only did a quick test!

ADD COMMENTlink written 6.9 years ago by Andreas2.3k
2
gravatar for Pierre Lindenbaum
6.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum101k wrote:

The Hits are already sorted on the e-value, why would you want to use another field ? To extract the first hit, this this previous question: http://biostar.stackexchange.com/questions/2869/standalone-blast-options

ADD COMMENTlink modified 4.2 years ago by Istvan Albert ♦♦ 74k • written 6.9 years ago by Pierre Lindenbaum101k
1

I think he means that he uses blast between a pair of sequences, and not with 1 sequence against a custom database. I'm pretty sure blast allows this later, but it's been ages since I last ran it and I don't remember how to do so.

ADD REPLYlink written 6.9 years ago by Jorge Amigo10.0k
1

Hi Pierre and Jorge,

Thanks for your comments.

To make it clear, I want to search 1 (or more) sequences with a custom database using local NCBI BLAST (nucleotide vs nucleotide).

I donot want to use e-value as criterion as the length of alignment is important. Thats why I would like to use bit score.

As per Pierre's link what I need is -v.

Thanks again for comments. Anymore comments are very welcome.

best

ADD REPLYlink written 6.9 years ago by Kap50
1

As Andreas wrote, I select hits in similar way, using e-value and % identity by sorting blast file (output format -m 8) and then I sort with unique option on the first (query) column.

ADD REPLYlink written 6.0 years ago by Maciej Jończyk590
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1342 users visited in the last hour