Question

When will the blast program stops giving out hits if there are too many (for web server and standalone)

0

Entering edit mode

5.5 years ago

johnnytam100 ▴ 110

I have a protein domain of interest.

I want to search for a standalone protein with only that domain as its majority length.

I can think of two methods doing the job:

Method 1: blastp with all nr sequences -> grep result within desired length -> my result

Method 2: grep nr sequences within desired length -> build blast database -> blastp -> my result

I prefer method 2 because I think if I use method 1, the overwhelming number of hits that are not within the desired length will wipe out all the hits I want.

Of course it is easy that I just test the two methods, I just want to know

1) How do you compare the two methods?

2) When will the blast program stops giving out hits if there are too many (for web server and standalone)?

Thank you.

blast • 943 views

ADD COMMENT • link updated 5.4 years ago by Biostar 20 • written 5.5 years ago by johnnytam100 ▴ 110

0

Entering edit mode

I would go for option 1. much more unbiased and I think quicker then the subsampling approach.

To get all the hits you want be sure to set num_alignments or max_target_seq high enough to get all the hits you want , depending no the input you might also consider raises the e-value threshold

as for you second part of the question: it will stop outputting if either of the thresholds I mentioned above are reached

ADD REPLY • link 5.5 years ago by lieven.sterck 15k

0

Entering edit mode

Good point to consider bias! I want to know if blast will anyway output all results within my set constraints, how would num_alignments and max_target_seq affect if I will get all the hits I want?

ADD REPLY • link 5.5 years ago by johnnytam100 ▴ 110

0

Entering edit mode

theoretically you can set them up to the number of entries in your database, but normally you should not go to that extreme I think.

Running standalone blast with such a small input should not take too long, so you might have the opportunity to try a few values for those parameters

ADD REPLY • link 5.5 years ago by lieven.sterck 15k