Question: It should be easy, but...How to keep best hits in blast result when encountering frame shift?
0
gravatar for qingxiangg
3.3 years ago by
qingxiangg30
qingxiangg30 wrote:

Hello everyone, I'm using Blast+ 2.4.0

This is my question and it bothered me for a while. And this is not duplicate of previous questions like Standalone Blast Options Blastall gives several hits to the same subject sequence

What I want to is to keep best hits in blast result when encountering frame shift

Here is my command with no extra flags.

nohup tblastn -query ../query.fasta -db ~/database/nr -out result.xls -outfmt "6 qseqid sseqid pident mismatch qlen length qstart qend sstart send evalue bitscore" -evalue 1e-5

Here is what I've got:

TCONS_00000346 gi|1048050614|ref|XP_017480395.1| 66.667 78.67 2 0 75 25 0 14 238 3 77 2.60e-50 104 TCONS_00000346 gi|1048050614|ref|XP_017480395.1| 66.667 85.33 3 0 75 25 0 258 482 78 152 2.60e-50 102 TCONS_00000346 gi|1048050614|ref|XP_017480395.1| 59.259 81.48 1 0 27 11 0 499 579 152 178 2.60e-50 40.4 TCONS_00000346 gi|952520900|gb|KRT81173.1| 66.667 25 579 75 14 238 3 77 5.98e-50 105
TCONS_00000346 gi|952520900|gb|KRT81173.1| 68.000 24 579 75 258 482 78 152 5.98e-50 101
TCONS_00000346 gi|952520900|gb|KRT81173.1| 54.839 14 579 31 487 579 148 178 5.98e-50 39.7

blablabla . .

Multiple subjects were observed and obviously it's because the frameshift, I want to extract the best hits with frameshifts information kept (for downstreaming CDS prediction)

Then I've tried two ways to get (some sort of) the "best hits" (Failed...)

  1. I tried the "-max_hsps 1" flag

And I got: TCONS_00000346 gi|1048050614|ref|XP_017480395.1| 66.667 25 579 75 14 238 3 77 2.55e-50 104
TCONS_00000346 gi|952520900|gb|KRT81173.1| 66.667 25 579 75 14 238 3 77 5.98e-50 105
blablabla . . . It simply remove other two hits and the frameshift information was removed too.

  1. I tried sorting the result

export LANG=C; export LC_ALL=C; sort -k1,1 -k12,12gr -k11,11g -k3,3gr test | sort -u -k1,1 --merge > bestHits

TCONS_00000346 gi|1046525669|gb|OCK29182.1| 70.667 22 579 75 258 482 1171 1245 1.10e-45 114

Undoubtly I got only one subject with highest e-value for one query, this is also not what I want.

I also check the blast manual and I didn't find relative solutions except by writting a script.

So in brief, my question is:

Does loacl blast have a build-in argument to tell apart the real "redundant" subject hits and the frameshift-caused (but biological making sense) multiple subjects hits, and kept the latter one in blast output?

blast • 877 views
ADD COMMENTlink written 3.3 years ago by qingxiangg30

I don't think there's a solution using blast options. I would post-process the output. For example you could pipe the output into a perl script filter.

ADD REPLYlink written 3.3 years ago by Jean-Karim Heriche21k

Thanks a lot Heriche! I decide to write a script. Thanks!

ADD REPLYlink written 3.3 years ago by qingxiangg30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1798 users visited in the last hour