Obtaining the top matches from blast
8.3 years ago
I have downloaded the current version of the stand-alone-blast (ncbi-blast-2.2.29+) and I am trying to use blast (blastn) to find similarity of of a group of nucleotide sequences that I have. However, I am interested on only the top 3 matches. I tried searching online and I saw some posts that suggests using -K, but I realized this does not work with the new version that I am using. I looked at the help document and I tried using (-max_target_seqs) and (-num_alignments) but none of them worked. The result contains all the matches found by blast.

Does anyone know how to limit the results to let say just top 3 matches?

Could you plz explain a bit more about the sorting technique that has been referred to in this thread?

8.3 years ago
hpmcwill ★ 1.2k

Depends what you are trying to do.

As Neilfws says, if you want to limit the number of hits reported you can use (from the NCBI BLAST+ help output):

 -num_descriptions <Integer, >=0>
Number of database sequences to show one-line descriptions for
Not applicable for outfmt > 4
Default = 500'
* Incompatible with:  max_target_seqs
-num_alignments <Integer, >=0>
Number of database sequences to show alignments for
Default = 250'
* Incompatible with:  max_target_seqs


These correspond to the '-v' and '-b' options in legacy NCBI BLAST:

  -v  Number of database sequences to show one-line descriptions for (V) [Integer]
default = 500
-b  Number of database sequence to show alignments for (B) [Integer]
default = 250


The '-K' option in legacy NCBI BLAST:

  -K  Number of best hits from a region to keep. Off by default.
If used a value of 100 is recommended.  Very high values of -v or -b is also suggested [Integer]


Is slightly different and maps to the '-culling_limit' parameter in NCBI BLAST+:

 -culling_limit <Integer, >=0>
If the query range of a hit is enveloped by that of at least this many
higher-scoring hits, delete the hit
* Incompatible with:  best_hit_overhang, best_hit_score_edge


You may also want to limit the number of matches reported per hit (i.e. limit the number of HSPs):

 -max_hsps <Integer, >=0>
Set maximum number of HSPs per subject sequence to save (0 means no limit)
Default = 0'


Thank you very much hpmcwill! I am sorry that my post was not clear enough, I was looking to limit the number of matches reported per hit so (-max_hsps) did the job.

5.2 years ago
shinken123 ▴ 110

Using the output of blast using the option -outfmt 6

awk '!seen[\$1]++' Blast_output_file.txt > Besthit_Blast_output_file.txt

Could you explain your awk command please ? I am very interested by it !

8.3 years ago
Neilfws 49k

The relevant options are in the BLAST handbook:

num_descriptions    integer 500 Show one-line descriptions for this number of database sequences.
num_alignments  integer 250 Show alignments for this number of database sequences.

8.3 years ago
edrezen ▴ 730

What is the output format you use ? I think these options may not work with the default blast output format.

If you try the tabular output format (just add -outfmt 6 to your command), it may work better.

Changing the format to option 6 didn't help.

8.3 years ago
Whoknows ▴ 920

Please run your query with this parameter -outfmt 6 with this you can select those with highest similarity and also you can find out the number of mismatches, Then sort it .

But use this -best_hit_overhang` for finding best hit over the blast.

Thanks for the response. I knew I could sort and pick the top hit but I just thought there should be a parameter while running blast that can limit the results (at least there was one for an older version).

