I am working on copy number analysis and want to apply HMM on my data.
Say, I have data for1 individual with ~60k windows. I know about each window that if there is gain, loss or normal copy number. Eg-
chr1 0 100 Loss chr1 500 600 Loss chr1 600 700 Gain
What I want to do-
I want to find if any window contains the observed state due to errors. So I want to have true state based on previous states. Eg - I have, say, 10 windows which have following copy number-
Loss Loss Loss Loss Normal Loss Loss Loss Loss Loss
In the above example, we can say that, the copy number in 5th window (Normal) is probably due to some errors, so we can set the true state of 5th window as Loss. (this is only one simple example as there will be lot more different cases where we cannot decide just by looking).
What I have understood-
I can define my 3 states as - Gain, Loss and Normal.
Then I can randomly assign state transition probability and observation probability.
Then apply Baum-Walch algo for fitting parameters (to normalize my random probabilities based on sequence of states in my 60k windows).
Then apply Viterbi algo for getting the true states.
Do you think it is appropriate to apply HMM on my data or I misunderstood everything wrong and it is not a good idea?
If HMM will work, can somebody tell me if I need to change something in my aforementioned steps.
Although Baum algo will be used for fitting but I really have a bad feeling for assigning probabilities randomly in the beginning?
P.S: Please let me know if I should post this question on stats stack exchange but I thought it makes more sense to post it here (Biological data + Algorithms).
Thanks in advance,
EDIT: If you think this problem can be solved by using some other algo or procedure, please let me know.