Question: Should I use DNA or Protein sequence to build a HMM profile?
Hello my friends,

I want to build a HMM profile to search against a DNA fasta file. I use the hmmbuild to build the HMM and hmmsearch to search.

However, I am not sure should I use DNA sequence or protein sequence to construct the HMM ? Since the fasta files is composed of DNA sequence, I am afraid that the protein HMM profile do not work on it. If I construct a DNA HMM profile, there are also problems like the orientation of the protein-encoding genes and the degenerate codon.

Do you have any idea? Thank you for any help!

If you want a model for proteins then build it using protein sequences. If you want to model some sort of ORF finding process that's dependent on the resulting protein sequence (enjoy dealing with splicing) then use the DNA sequence.

hmmer does not support searching DNA with a protein query. This is possible with Blast.

You could translate your fasta in all 6 reading frames. Than you can search with a protein HMM. Nevertheless if you get partial Hits I would check the corresponding DNA sequences for inserts, deletions and introns.

nhmmer is also an option. It is maybe less sensitive, but the orientation is not a problem. The subject sequences are searched in both directions. The advantage is that inserts and deletions have less effect on the hit length.

