I have an HMM database of 83 HMMs.
I want to use this to pull out all of the hits in NR so there are many millions of sequences here.
Would I use hmmsearch or hmmscan for this?
I have an HMM database of 83 HMMs.
I want to use this to pull out all of the hits in NR so there are many millions of sequences here.
Would I use hmmsearch or hmmscan for this?
See here for previous discussions on this issue. It isn't difficult to figure this out on your own: make a small database in lieu of nr, say 1000 proteins, and do both kinds of searches. It should be pretty obvious what works better for your purposes.
If you use a tabular output and the same database size (the -Z switch), you will get identical results with either approach, but hmmsearch will be at least 2-3x faster, possibly even 10x.
If you want the alignments, with hmmsearch you will get results that are easier to inspect. Basically you will have 83 chunks of results in the output, where for each HMM the hits will be listed and aligned. With hmmscan you will get 300+ million chunks, because each individual sequence from nr will be searched and aligned against all your HMMs. I would not want to do that. Given that hmmsearch is also faster, to me that's a clear winner.
I would use hmmscan only for a relatively small number of sequences, for example to annotate a proteome of a single species against the Pfam database.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you, this is what I was looking for.