Entering edit mode

3.4 years ago

anasjamshed
▴
140

The following figure shows an HMM with two states a and b. When HMM is in the state a, it is more likely to emit purines (A and G). When in state b, it is more likely to emit pyrimidines (C and T). Decode the most probable sequence of states (a / b) for the GGCT sequence. Use logarithmic scores instead of regular odds scores.

I have made this table :

```
A G T C Mean Length
α 2/5 2/5 1/10 1/10 9/10
β 1/5 1/5 3/10 3/10 9/10
```

But I don't know how to decode GGCT sequence and how can I do it in python?

Some hints to solve your homework:

Search for Viterbi algorithm if you want to program a general solution. You also need to specify the matrix of state-transition probabilities.

To quickly solve this particular example on paper, you can enumerate all possible solutions. All transition probabilities are symmetric, so there are only three possible cases: 1. emit all symbols from alpha (transition prob. 0.9 for each symbol), 2. emit all symbols from beta, 3. start in alpha, emit GG, transition to beta (costing a lot), then emit CT. Write up the product of probabilities (transition * emision) for each path and choose the path with highest probability.

I propose you manage the log transformation and the other paths yourself.