Question: parameter in blastn to limit the output to top 20 best hits
0
gravatar for J.F.Jiang
4.7 years ago by
J.F.Jiang750
China
J.F.Jiang750 wrote:

Hi all,

I am dealing with the blastn proctocal to scan the region without any homo information.

However, for those regions sharing huge homologous sequences, the blast will generate huge data and take a lot of time.

Therefore, I want to limit the output to those best hits, for example , top 20.

The command I used is as follows,

 

blastn -task blastn-short -db /data/WholeGenomeFasta/blast+/genome -query 1.fa -evalue 0.01 -num_threads 8 -outfmt "6 qseqid sseqid nident sstart send"  -num_alignments 20

 

However, it seems that the parameter num_alignments did not work.

If there is any possible method I can do such a thing, and can I print the 20 alignments that sorted by the total identical counts?

 

Thanks

 

blastn • 2.3k views
ADD COMMENTlink modified 4.7 years ago by Biojl1.6k • written 4.7 years ago by J.F.Jiang750
1

Just to clarify, when you say your approach doesn't work, do you mean that regardless of the options you pass, the blastn takes a very long time, or do you mean that the ultimate output contains too many records?

ADD REPLYlink written 4.7 years ago by Christian30

Which Blast version are you using?

ADD REPLYlink written 4.7 years ago by Biojl1.6k

2.2.9 blast+

ADD REPLYlink written 4.7 years ago by J.F.Jiang750
2
gravatar for Christian
4.7 years ago by
Christian30
United States
Christian30 wrote:

Assuming you're using a newer version of BLAST+, perhaps using -max_target_seqs 20 option in place of the -num_alignments would get you directly to the data you're after. If that doesn't give you suitable results, you can probably thin out the number of hits using the -best_hit_overhang or -best_hit_score_edge options.

Take a look at http://www.ncbi.nlm.nih.gov/books/NBK1763/#CmdLineAppsManual.User_manual (if you haven't already) for more information; in Appendix C (about 3/4 down the page) you should see tables describing the various options you can pass.

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Christian30

All other parameters have been tried before.

I also looked into the link you offered, I think it should be the same as using blastn -h command.

if you can help, here is one example in my fa file,

>chr1:8920112-8920211
ATTTTTAGTAGAGATGGGGTTTCAACATATTGGCCAGGCTGGTCTCAAACTCCTGACCTTGTAATCCACCAGCCTCAGCCTCCCAAAGTGCTGGGATTAG

thanks.

ADD REPLYlink written 4.7 years ago by J.F.Jiang750
0
gravatar for Biojl
4.7 years ago by
Biojl1.6k
Barcelona
Biojl1.6k wrote:

I think the outfmt is wrong. You should only type the number of output you want (6 in this case) and nothing else. The quotes are preventing the rest of the command to execute correctly. Then use the -max_target_seqs that is the correct option for blast 2.2.29+

blastn -task blastn-short -db /data/WholeGenomeFasta/blast+/genome -query 1.fa -evalue 0.01 -num_threads 8 -outfmt 6 -max_target_seqs 20
ADD COMMENTlink written 4.7 years ago by Biojl1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1456 users visited in the last hour