Obtaining the top matches from blast
5
2
Entering edit mode
8.3 years ago
S ▴ 100

Hi,

I have downloaded the current version of the stand-alone-blast (ncbi-blast-2.2.29+) and I am trying to use blast (blastn) to find similarity of of a group of nucleotide sequences that I have. However, I am interested on only the top 3 matches. I tried searching online and I saw some posts that suggests using -K, but I realized this does not work with the new version that I am using. I looked at the help document and I tried using (-max_target_seqs) and (-num_alignments) but none of them worked. The result contains all the matches found by blast.

Does anyone know how to limit the results to let say just top 3 matches?

Thanks!

blast • 18k views
ADD COMMENT
0
Entering edit mode

Could you plz explain a bit more about the sorting technique that has been referred to in this thread?

ADD REPLY
8
Entering edit mode
8.3 years ago
hpmcwill ★ 1.2k

Depends what you are trying to do.

As Neilfws says, if you want to limit the number of hits reported you can use (from the NCBI BLAST+ help output):

 -num_descriptions <Integer, >=0>
   Number of database sequences to show one-line descriptions for
   Not applicable for outfmt > 4
   Default = `500'
    * Incompatible with:  max_target_seqs
 -num_alignments <Integer, >=0>
   Number of database sequences to show alignments for
   Default = `250'
    * Incompatible with:  max_target_seqs

These correspond to the '-v' and '-b' options in legacy NCBI BLAST:

  -v  Number of database sequences to show one-line descriptions for (V) [Integer]
    default = 500
  -b  Number of database sequence to show alignments for (B) [Integer]
    default = 250

The '-K' option in legacy NCBI BLAST:

  -K  Number of best hits from a region to keep. Off by default.
If used a value of 100 is recommended.  Very high values of -v or -b is also suggested [Integer]

Is slightly different and maps to the '-culling_limit' parameter in NCBI BLAST+:

 -culling_limit <Integer, >=0>
   If the query range of a hit is enveloped by that of at least this many
   higher-scoring hits, delete the hit
    * Incompatible with:  best_hit_overhang, best_hit_score_edge

You may also want to limit the number of matches reported per hit (i.e. limit the number of HSPs):

 -max_hsps <Integer, >=0>
   Set maximum number of HSPs per subject sequence to save (0 means no limit)
   Default = `0'

For more information about the NCBI BLAST+ command-line options see:

ADD COMMENT
0
Entering edit mode

Thank you very much hpmcwill! I am sorry that my post was not clear enough, I was looking to limit the number of matches reported per hit so (-max_hsps) did the job.

ADD REPLY
4
Entering edit mode
5.2 years ago
shinken123 ▴ 110

Using the output of blast using the option -outfmt 6

What about:

awk '!seen[$1]++' Blast_output_file.txt > Besthit_Blast_output_file.txt
ADD COMMENT
0
Entering edit mode

Hello,

Could you explain your awk command please ? I am very interested by it !

ADD REPLY
2
Entering edit mode
8.3 years ago
Neilfws 49k

The relevant options are in the BLAST handbook:

num_descriptions    integer 500 Show one-line descriptions for this number of database sequences.
num_alignments  integer 250 Show alignments for this number of database sequences.
ADD COMMENT
0
Entering edit mode

Thank you very much for the link!

ADD REPLY
1
Entering edit mode
8.3 years ago
edrezen ▴ 730

Hi,

What is the output format you use ? I think these options may not work with the default blast output format.

If you try the tabular output format (just add -outfmt 6 to your command), it may work better.

ADD COMMENT
1
Entering edit mode

Changing the format to option 6 didn't help.

ADD REPLY
0
Entering edit mode
8.3 years ago
Whoknows ▴ 920

Hi

Please run your query with this parameter -outfmt 6 with this you can select those with highest similarity and also you can find out the number of mismatches, Then sort it .

But use this -best_hit_overhang for finding best hit over the blast.

ADD COMMENT
0
Entering edit mode

Hi,

Thanks for the response. I knew I could sort and pick the top hit but I just thought there should be a parameter while running blast that can limit the results (at least there was one for an older version).

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1379 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6