I am trying to set the number of maximum hits to 5, so that the procedure can finish sooner, but I still get 100s of hits found.
# TBLASTX 2.2.29+ # Query: Locus_40_Transcript_185/186_Confidence_0.224_Length_4778 # Database: ../../Genome/Genome # Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score # 714 hits found
I am running:
tblastx -db ../../Genome/Genome -query all_merged_k125.fa -evalue 1e-10 -outfmt 7 -out tblastx/all_merged_k125.fmt7 -num_threads 16 -max_target_seqs 5
Any idea why it's still reporting so many hits?
Ok that makes sense. How do I limit the amount of hits?
I would do it post blast (with
Make sure the file is sorted based on query and best hits (here bitscore > evalue > perc identity):
Then get the top 5 hits for every query:
Yes, the above command made my day!!!!!!!!!
-max_target_seqnsto 1 will give only 1 subject/hit but several HSPs if they are present.
-max_hspsto 1 will give only 1 HSP per subject but for all subject/hits in the database.
If you really want only 5 HSPs per subject, set the
-max_target_seqnsto 1 and
Makes sense, thank you!
(typo: should be
I guess you can always give a relatively stringent e-value and filter the resulting hits later.
What I wanted is to speed up the blasting.
I doubt limiting the number of hits like that would speed up your blasting significantly. It still has to go through the whole db for every query, so the only difference would be in how long it takes to write 5 or 10 lines (or whatever) to the output file. Instead, if your db is small (or you have a ton of RAM), you should parallelize blast (e.g. with GNU Parallel) by running multiple single-threaded blasts on split input instead of using
^True. You will benefit from multi-threading, and trying both
blastall -p tbalstxbefore choosing one of them. For shorter query sequences, I've seen the latter be significantly faster than the former.