I am currently in a bioinformatics class and we are covering the Viterbi algorithm. To try and better understand the algorithm I have been reading various resources. On wikipedia, there is a pretty simple Java implementation of the code: http://en.wikipedia.org/wiki/Viterbi_algorithm. I am interested in implementing code like this on my own, but I'm not sure if that example is correct and was wondering if someone could explain why it is or why it is not.
If I run with the example provided in the code, where:
states = ('Rainy', 'Sunny')
observations = ('walk', 'shop', 'clean')
start_probability = {'Rainy': 0.6, 'Sunny': 0.4}
transition_probability = {
'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3},
'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6},
}
emission_probability = {
'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5},
'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1},
}
Then the output that I get is that most likely series of events was {Sunny,Rainy,Rainy,Rainy}. Why do I get four events, if there were only three emissions? Thanks for your time!
Isn't just that you get a beginning "start state" that is predefined and this why you have 4 states (so 1 start state and 3 states for the observations)?
Have played around with the same example as you show here and if my memory is correct it was working correctly for me.
Also, for more examples that have biology in it I can really recommend the following book: http://www.amazon.com/Biological-Sequence-Analysis-Probabilistic-Proteins/dp/0521629713