Question

Not Sure If I'M Getting The Right Output From Viterbi Algorithm

2

Entering edit mode

12.9 years ago

Doug ▴ 20

I am currently in a bioinformatics class and we are covering the Viterbi algorithm. To try and better understand the algorithm I have been reading various resources. On wikipedia, there is a pretty simple Java implementation of the code: http://en.wikipedia.org/wiki/Viterbi_algorithm. I am interested in implementing code like this on my own, but I'm not sure if that example is correct and was wondering if someone could explain why it is or why it is not.

If I run with the example provided in the code, where:

states = ('Rainy', 'Sunny')

observations = ('walk', 'shop', 'clean')

start_probability = {'Rainy': 0.6, 'Sunny': 0.4}

transition_probability = {
   'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3},
   'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6},
}

emission_probability = {
   'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5},
   'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1},
}

Then the output that I get is that most likely series of events was {Sunny,Rainy,Rainy,Rainy}. Why do I get four events, if there were only three emissions? Thanks for your time!

algorithm java • 4.3k views

ADD COMMENT • link updated 12.9 years ago by Gjain 5.8k • written 12.9 years ago by Doug ▴ 20

0

Entering edit mode

Isn't just that you get a beginning "start state" that is predefined and this why you have 4 states (so 1 start state and 3 states for the observations)?

Have played around with the same example as you show here and if my memory is correct it was working correctly for me.

Also, for more examples that have biology in it I can really recommend the following book: http://www.amazon.com/Biological-Sequence-Analysis-Probabilistic-Proteins/dp/0521629713

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 12.9 years ago by Sander Timmer ▴ 710

Ram · Answer 1 · 2011-12-05

3

Entering edit mode

12.9 years ago

Gjain 5.8k

This link helped me to understand the basics of viterbi algorithm and how to implement it. Its generic, so you can implement it in the language of your choice.

If you go through each section, example and summary, you should be able to figure out and make this example working.

Hope this helps.

ADD COMMENT • link 12.9 years ago by Gjain 5.8k

2

Entering edit mode

I've been through those pages, they're very good.

ADD REPLY • link 12.9 years ago by User 59 13k

0

Entering edit mode

Thanks Daniel, they really helped me understand the concept when I was studying and now when I use them practically.

ADD REPLY • link 12.9 years ago by Gjain 5.8k

0

Entering edit mode

Nice resource. I just read over it. One question I have is how does viterbi algorithm apply to sequence prediction? I can see how the forward algorithm is used in our predictions. But is there any practical use for the viterbi algorithm where you want to get the most likely sequence from a set of hidden states using an observed sequence? Maybe I am just interpreting it wrong.

ADD REPLY • link 12.9 years ago by Damian Kao 16k

0

Entering edit mode

nope, you are correct and what you said is basically one of the ways of gene identification. This link should make it clear. http://www.nature.com/nbt/journal/v22/n10/full/nbt1004-1315.html

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 12.9 years ago by Gjain 5.8k

score 2 · Answer 2 · 2011-12-05

Hi,

If it gives you 4 states with 3 observations there's a bug.

I ran the wiki's python code with wiki's example data and obtain exactly the result which is on that page : ['Sunny', 'Rainy', 'Rainy']

Hence, one possibility you have is to write your own Java version translating their Python one, which seems to be correct.

A nice article I studied when I was writing my own implementation of HMM routines is: "A revealing introduction to hidden markov models" by Mark Stamp, you can google for a pdf. (it does not explain Viterbi, but some important background material). Viterbi is covered in Bishop's book "Pattern Recognition and Machine learning", for example.

Alternatively, if you send me an email address I can forward you the code of my C implementation.