I have 460 amino acid sequences for a specific protein family X. I want to make a HMM model for those sequences. I will use that HMM to search homologs of X / neighbour of X in some bacterial genomes. How can I make a HMM model ? Is there any software available? What is the step to do that?
The HMMER software (which is well documented) can be used to produce HMMs from alignments.
As for searching DNA with protein HMMs:
nhmmer can be used to build nHMMs from nucleotide alignments. They can be used to query DNA.
hmmer can be used to build HMMs from amino acid alignments. They can be used to query proteins.
The current HMMER manual (2015, version 3.1) states:
"Still missing: Translated comparisons. We’d of course love to have the HMM equivalents of BLASTX, TBLASTN, and
TBLASTX. They’ll come."
(Of course, you could always translate your genome in all frames , chop it up, and then screen it using protein HMMs. It is a bit ugly, you might run into some frameshift trouble, but maybe it works?)
To give a bit more details, to build a HMM for a set of proteins, the steps are:
- build a multiple sequence alignment with e.g. Clustal
- run hmmbuild (from the HMMER package) with the multiple sequence alignment as input
Thanks. From their manual I came to know, I need to give .sto file as input to get a .hmm file. Does Clustal give .sto file as output? Or I have to use different software to convert my file after clustal?
To give a bit more details, to build a HMM for a set of proteins, the steps are:
- build a multiple sequence alignment with e.g. Clustal
- run hmmbuild (from the HMMER package) with the multiple sequence alignment as input
Thanks. From their manual I came to know, I need to give .sto file as input to get a .hmm file. Does Clustal give .sto file as output? Or I have to use different software to convert my file after clustal?
Cheers
As far as I remember, hmmbuild can read alignments in several formats. Just check the docs.