BLAST, set a limit to the number of hits in output
1
0
Entering edit mode
11 weeks ago
ashenflower ▴ 20

Hello evrybody, I am currently using blastn on several contigs, with -outfmt 6, and I get something like:

ID_1     xxx    xxx   xxx   xxx 
ID_1     xxx    xxx   xxx   xxx
ID_1     xxx    xxx   xxx   xxx
ID_1     xxx    xxx   xxx   xxx
ID_1     xxx    xxx   xxx   xxx
ID_1     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx

Now, I would like to limit the number of hits returned in the output (i.e. the nr of lines for each ID) because the output file is too huge and I don't have enough memory to store them all and filtering it later. Is there any way to do it? I already tried with the parameters -max_target_seqs and -max_hsps, but it's not working.

blastn output blast hits • 531 views
ADD COMMENT
1
Entering edit mode

-num_alignments will do the job

ADD REPLY
0
Entering edit mode

Thank you, I'll give it a try!

ADD REPLY
0
Entering edit mode

I already tried with the parameters -max_target_seqs and -max_hsps, but it's not working.

What do you mean , "it's not working" ? Why do you think so?

ADD REPLY
0
Entering edit mode

I tried to use them on a subset of sequences, and the number of results in output for each sequence was the same with or without those parameters

ADD REPLY
0
Entering edit mode
11 weeks ago
Asaf 9.4k

Yes, the max_hsps limit the number of aligned regions within each pair of subject-query. You can run sort with a pipe (|) so the file is not written to disk, only after it's sorted and the best hit is chosen (sort according to subject and e-value then sort again with -m to keep only one hit per subject line)

ADD COMMENT
0
Entering edit mode

what would the -m option of sort do then?

ADD REPLY
0
Entering edit mode

It doesn't sort the file, it uses the previous sort and then when you apply -u (unique) using the first column you will keep the best hit.

ADD REPLY
0
Entering edit mode

interesting, never used the -u in combination with -k, let alone -m ...

gives pretty confusing output though :/

sort testSort
a       10
a       10
a       12
b       12
b       12

sort -u testSort
a       10
a       12
b       12

sort -u -k1,1 testSort
a       12
b       12

sort -u -k1 testSort
a       10
a       12
b       12

sort -u -m -k1,1 testSort
a       12
b       12
a       10
b       12
a       10

and I know it's not stackoverflow here, but still :)

even after decades working in linux , some of its commands and especially option usage amaze me ...

ADD REPLY
1
Entering edit mode

And

 sort testSort |sort -m -u -k 1,1      
  a       10
  b       12             

Which is what want in this case.

ADD REPLY
0
Entering edit mode

nice , #TIL

though I get the same output with or without -m ?

and this only seems to work when using sort twice ??

even after decades working in linux , some of its commands and especially option usage amaze me ...

and now even more so :)

ADD REPLY
0
Entering edit mode

It will also work without the -m here but in other cases where the column you merge is not the first column you sorted by it will be necessary.

ADD REPLY

Login before adding your answer.

Traffic: 1244 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6