parameter in blastn to limit the output to top 20 best hits
2
0
Entering edit mode
9.6 years ago
J.F.Jiang ▴ 910

Hi all,

I am dealing with the blastn proctocal to scan the region without any homo information.

However, for those regions sharing huge homologous sequences, the blast will generate huge data and take a lot of time.

Therefore, I want to limit the output to those best hits, for example, top 20.

The command I used is as follows,

blastn \
  -task blastn-short \
  -db /data/WholeGenomeFasta/blast+/genome \
  -query 1.fa \
  -evalue 0.01 \
  -num_threads 8 \
  -outfmt "6 qseqid sseqid nident sstart send" \
  -num_alignments 20

However, it seems that the parameter num_alignments did not work.

If there is any possible method I can do such a thing, and can I print the 20 alignments that sorted by the total identical counts?

Thanks

blastn • 6.7k views
ADD COMMENT
1
Entering edit mode

Just to clarify, when you say your approach doesn't work, do you mean that regardless of the options you pass, the blastn takes a very long time, or do you mean that the ultimate output contains too many records?

ADD REPLY
0
Entering edit mode

Which Blast version are you using?

ADD REPLY
0
Entering edit mode

2.2.9 blast+

ADD REPLY
2
Entering edit mode
9.6 years ago
Christian ▴ 30

Assuming you're using a newer version of BLAST+, perhaps using -max_target_seqs 20 option in place of the -num_alignments would get you directly to the data you're after. If that doesn't give you suitable results, you can probably thin out the number of hits using the -best_hit_overhang or -best_hit_score_edge options.

Take a look at http://www.ncbi.nlm.nih.gov/books/NBK1763/#CmdLineAppsManual.User_manual (if you haven't already) for more information; in Appendix C (about 3/4 down the page) you should see tables describing the various options you can pass.

ADD COMMENT
0
Entering edit mode

All other parameters have been tried before.

I also looked into the link you offered, I think it should be the same as using blastn -h command.

If you can help, here is one example in my fa file,

>chr1:8920112-8920211
ATTTTTAGTAGAGATGGGGTTTCAACATATTGGCCAGGCTGGTCTCAAACTCCTGACCTTGTAATCCACCAGCCTCAGCCTCCCAAAGTGCTGGGATTAG

Thanks

ADD REPLY
0
Entering edit mode
9.6 years ago
Biojl ★ 1.7k

I think the outfmt is wrong. You should only type the number of output you want (6 in this case) and nothing else. The quotes are preventing the rest of the command to execute correctly. Then use the -max_target_seqs that is the correct option for blast 2.2.29+

blastn -task blastn-short -db /data/WholeGenomeFasta/blast+/genome -query 1.fa -evalue 0.01 -num_threads 8 -outfmt 6 -max_target_seqs 20
ADD COMMENT

Login before adding your answer.

Traffic: 2269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6