Hello everyone, I'm using Blast+ 2.4.0
This is my question and it bothered me for a while. And this is not duplicate of previous questions like Standalone Blast Options Blastall gives several hits to the same subject sequence
What I want to is to keep best hits in blast result when encountering frame shift
Here is my command with no extra flags.
nohup tblastn -query ../query.fasta -db ~/database/nr -out result.xls -outfmt "6 qseqid sseqid pident mismatch qlen length qstart qend sstart send evalue bitscore" -evalue 1e-5
Here is what I've got:
TCONS_00000346 gi|1048050614|ref|XP_017480395.1| 66.667 78.67 2 0 75 25 0 14 238 3 77 2.60e-50 104
TCONS_00000346 gi|1048050614|ref|XP_017480395.1| 66.667 85.33 3 0 75 25 0 258 482 78 152 2.60e-50 102
TCONS_00000346 gi|1048050614|ref|XP_017480395.1| 59.259 81.48 1 0 27 11 0 499 579 152 178 2.60e-50 40.4
TCONS_00000346 gi|952520900|gb|KRT81173.1| 66.667 25 579 75 14 238 3 77 5.98e-50 105
TCONS_00000346 gi|952520900|gb|KRT81173.1| 68.000 24 579 75 258 482 78 152 5.98e-50 101
TCONS_00000346 gi|952520900|gb|KRT81173.1| 54.839 14 579 31 487 579 148 178 5.98e-50 39.7
blablabla . .
Multiple subjects were observed and obviously it's because the frameshift, I want to extract the best hits with frameshifts information kept (for downstreaming CDS prediction)
Then I've tried two ways to get (some sort of) the "best hits" (Failed...)
- I tried the "-max_hsps 1" flag
And I got:
TCONS_00000346 gi|1048050614|ref|XP_017480395.1| 66.667 25 579 75 14 238 3 77 2.55e-50 104
TCONS_00000346 gi|952520900|gb|KRT81173.1| 66.667 25 579 75 14 238 3 77 5.98e-50 105
blablabla
.
.
.
It simply remove other two hits and the frameshift information was removed too.
- I tried sorting the result
export LANG=C; export LC_ALL=C; sort -k1,1 -k12,12gr -k11,11g -k3,3gr test | sort -u -k1,1 --merge > bestHits
TCONS_00000346 gi|1046525669|gb|OCK29182.1| 70.667 22 579 75 258 482 1171 1245 1.10e-45 114
Undoubtly I got only one subject with highest e-value for one query, this is also not what I want.
I also check the blast manual and I didn't find relative solutions except by writting a script.
So in brief, my question is:
Does loacl blast have a build-in argument to tell apart the real "redundant" subject hits and the frameshift-caused (but biological making sense) multiple subjects hits, and kept the latter one in blast output?
I don't think there's a solution using blast options. I would post-process the output. For example you could pipe the output into a perl script filter.
Thanks a lot Heriche! I decide to write a script. Thanks!