I want to identify protein domains in predicted ORFs from a de novo metatranscriptome assembly. If I use hmmsearch, because it's faster computationally than hmmscan (see extra info below), to compare the HMM profiles in the PFAM database to my predicted peptide database and set the -Z
option to the number of HMM's in PFAM, will the e-values and output of hmmsearch be identical to hmmscan with the same files (minus the -Z option)?
In comparing hmmscan
and hmmsearch
the authors of hmmer point out in the blog post, hmmscan vs. hmmsearch speed: the numerology:
hmmscan and hmmsearch are doing exactly the same compute, at heart: comparing one profile to one sequence at a time. Their bit score results are identical. You can save hmmsearch tabular output files and use ’em just the same way you were going to use the hmmscan files.
They also point out that hmmsearch
is faster because both programs are input-bound and hmmsearch
loads less data but they include this caveat in the post:
(Um, watch out for E-values: remember that E-values depend on the size of the database you search.)
I know that E-values are dependent on database size. My understanding is that only the target database size influences E-value and that the database for hmmscan E-value is the hmm file (PFAM in my case) while the database for hmmsearch is the sequence file.