BLAST, set a limit to the number of hits in output
1
0
Entering edit mode
11 weeks ago
ashenflower ▴ 20

Hello evrybody, I am currently using blastn on several contigs, with -outfmt 6, and I get something like:

ID_1     xxx    xxx   xxx   xxx
ID_1     xxx    xxx   xxx   xxx
ID_1     xxx    xxx   xxx   xxx
ID_1     xxx    xxx   xxx   xxx
ID_1     xxx    xxx   xxx   xxx
ID_1     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx
ID_2     xxx    xxx   xxx   xxx


Now, I would like to limit the number of hits returned in the output (i.e. the nr of lines for each ID) because the output file is too huge and I don't have enough memory to store them all and filtering it later. Is there any way to do it? I already tried with the parameters -max_target_seqs and -max_hsps, but it's not working.

blastn output blast hits • 531 views
1
Entering edit mode

-num_alignments will do the job

0
Entering edit mode

Thank you, I'll give it a try!

0
Entering edit mode

I already tried with the parameters -max_target_seqs and -max_hsps, but it's not working.

What do you mean , "it's not working" ? Why do you think so?

0
Entering edit mode

I tried to use them on a subset of sequences, and the number of results in output for each sequence was the same with or without those parameters

0
Entering edit mode
11 weeks ago
Asaf 9.4k

Yes, the max_hsps limit the number of aligned regions within each pair of subject-query. You can run sort with a pipe (|) so the file is not written to disk, only after it's sorted and the best hit is chosen (sort according to subject and e-value then sort again with -m to keep only one hit per subject line)

0
Entering edit mode

what would the -m option of sort do then?

0
Entering edit mode

It doesn't sort the file, it uses the previous sort and then when you apply -u (unique) using the first column you will keep the best hit.

0
Entering edit mode

interesting, never used the -u in combination with -k, let alone -m ...

gives pretty confusing output though :/

sort testSort
a       10
a       10
a       12
b       12
b       12

sort -u testSort
a       10
a       12
b       12

sort -u -k1,1 testSort
a       12
b       12

sort -u -k1 testSort
a       10
a       12
b       12

sort -u -m -k1,1 testSort
a       12
b       12
a       10
b       12
a       10


and I know it's not stackoverflow here, but still :)

even after decades working in linux , some of its commands and especially option usage amaze me ...

1
Entering edit mode

And

 sort testSort |sort -m -u -k 1,1
a       10
b       12


Which is what want in this case.

0
Entering edit mode

nice , #TIL

though I get the same output with or without -m ?

and this only seems to work when using sort twice ??

even after decades working in linux , some of its commands and especially option usage amaze me ...

and now even more so :)

0
Entering edit mode

It will also work without the -m here but in other cases where the column you merge is not the first column you sorted by it will be necessary.