I have a protein domain of interest.
I want to search for a standalone protein with only that domain as its majority length.
I can think of two methods doing the job:
Method 1: blastp with all nr sequences -> grep result within desired length -> my result
Method 2: grep nr sequences within desired length -> build blast database -> blastp -> my result
I prefer method 2 because I think if I use method 1, the overwhelming number of hits that are not within the desired length will wipe out all the hits I want.
Of course it is easy that I just test the two methods, I just want to know
1) How do you compare the two methods?
2) When will the blast program stops giving out hits if there are too many (for web server and standalone)?