Can anyone help me with Multiple Sequence Alignment (MSA) using Hidden Markov Model (HMM) by giving an example or a reference except these 2 references:
1- Eddy, Sea.R., et al.Multiple alignment using hidden markov models, 2- Boer Jonas, Multiple alignment using hidden Markov models, Seminar Hot Topics in Bioinformatics.
I know that there are 3 states: match, deletion and insertion and I know the emission probabilities and transitions probabilities can be learned by viterbi algorithm but what is vague is that if I want to do multiple alignment I need to have HMM and if I want to have HMM I need to have aligned sequences but we know that sequences are unaligned and also with simulated annealing we can Enter randomness to the model and have better solutions and also this algorithm is different with E-M algorithm.
I have another question how many states our model of HMM for this problem should have at the first step, does the number of states change during the time of convergence or it is fixed from the first??
If anybody can help me to understand what really happens in this MSA with HMM I'll appreciate.
I should explain that there have been found more sequences of DNA,RNA and protein but there are less information about structures and functions of each protein so we do MSA to understand the similarities between sequences and find out whether they are homologous (have a same ancestor) or not and find out the unknown structure and functions of sequences.
Thank to your answer, I know that I can derive a HMM from aligned sequences but as it is mentioned in Eddy's paper ,in the abstract part "A simulated annealing method is described for training HMM and producing MSAs from initially unaligned protein or DNA sequences".
This means simulated annealing is used to infer the HMM parameters. You can use an HMM to model a set of unaligned sequences and then use this HMM to produce a multiple sequence alignment. The problem with this approach however is that it requires a lot of sequences which is why one usually starts from an existing alignment. You can read more on using HMMs for multiple sequence alignments here.