Statistical Significance of Hidden Markov Models
2
0
Entering edit mode
8.4 years ago
JW ▴ 10

Hello,

I have created two Hidden Markov Models based on two sets of sequences using the Baum-Welch Algorithm. Now, I would like to know whether those Models differ significantly. Also, if I have a third sequence, I would like to know which HMM it is closer to (and of course whether that is statistically significant).

If anyone knows a way to do this, please feel free to drop a hint.

statistics HMM R sequence-analysis • 3.2k views
3
Entering edit mode
8.4 years ago
matted 7.8k

HMMs are generative models of sequences, so you can use pretty standard methods to compute and compare likelihoods.

For your third sequence, you can compute the likelihood of observing the sequence with each HMM, and compare the two likelihoods to find the best model.

For comparing the two HMMs, I would combine your two sets of sequences and train a single shared HMM off of that unified dataset.Then, you can calculate the likelihood of the data using that null model and also the alternative of two separate models. You can then use a likelihood ratio test to judge significance.In other words, you want to see if two separate models fit the data much better than a single shared model. If a single shared model is almost as good, then the two models are not significantly different.

0
Entering edit mode

Oh yeah that helped a lot! Thanks :)

0
Entering edit mode
8.4 years ago

You can use HMMscan (part of the HMMER package) to query HMMs with a sequence. As far as I know there's no commonly accepted way of measuring similarity between HMMs. Some ideas: use the Kulback-Liebler distance (but it is difficult to compute for HMMs) or use a measure based on Viterbi scores.