Pair HMM is widely used in DNA alignment.

For each HMM, there are 3 sets of characteristic parameters:

1) transitional probabilities

2) emission probabilities

3) initial probabilities

I understand how to use how to solve the alignment problem once we have a HMM.

But I don't know how we get the parameters of a HMM in the first place.

Can anyone give me an answer or references to related material?

Thanks!

In chapter 3 of BSA, the Baum-Welch algorithm is introduced to find parameters for an HMM based on a number of raw sequences (not aligned). These sequences are representative of something. Chapter 4 introduces pair HMMs; and here the misunderstanding is not whether Baum-Welch can be used to find those parameters. The question is, where do we get the aligned sequences to train off of and what does that say about the resultant probabilities we get when testing whether two other sequences are related or not? As far as I can tell, the book doesn't address this question.

My guess it that some repository of alignments must be used to derive the p's and q's, for instance pfam. But those are multiple alignments.