Extract 10 matches in BLAST
1
0
Entering edit mode
5.2 years ago
gkalgus • 0

I have a blast output in outfmt 6 format, like this:

JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  20.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287

But I have a zillion of lines, each subject has more than 100 matches, how I can extract only the first 10 matches of each subject?

Thanks.

blast extract • 1.5k views
ADD COMMENT
0
Entering edit mode

If you are running ncbi blastn from the command line you can limit the number of results displayed per query using the -num_descriptions (or possibly its -num_alignments ... one controls the one-line summaries and the other controls the ascii alignment views that are output in the default format). The default setting is pretty high, like 250-500 results per query. It will show the best scoring hits first, reporting as many as you request. If you have many tied, top scoring hits I think its arbitrary which amongst those hits gets shown.

If you are running it from the NCBI website you'll have to poke around their blast page. I don't recall if those are settings you can set using their web tool.

ADD REPLY
0
Entering edit mode

I am running a local BLAST, but the parameter num_descriptions works only with outmft 4 our less, and the parameter num_alignments return a variable number of matches, for example, I put the parameter num_alignments 10 and several subjects returner 13 or more matches...

ADD REPLY
0
Entering edit mode
5.2 years ago
h.mon 35k

Your example is really weird, as there are several identical lines - in fact, the only difference is second line third column, with a value of "20.21" instead of "91.21". I can't imagine blast producing such output, unless your query consists of the same sequence repeated several times.

I put the parameter num_alignments 10 and several subjects returner 13 or more matches...

Probably the same subject is returning more than one HSP, to solve this you have to use -max_hsps 1.

But I have a zillion of lines, each subject has more than 100 matches, how I can extract only the first 10 matches of each subject?

In blast terminology, the database sequences are the "subject", and the sequences one submits are the "query". I suspect you want to filter the first 10 matches of each query, correct? You can easily achieve this with perl or awk, here is an awk solution:

awk '{ if (++count[$1] <= 10) print $0 }' blast_tabular.tsv
ADD COMMENT

Login before adding your answer.

Traffic: 2859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6