Where to start:

Are you trying to understand HMMs or are you trying to understand a specific application? If second: Post a source and I'll try to explain it.

## Basics about Hidden Markov Models:

A HMM is a so called generative model (with the meaning: it generates your data)
You can imagine a HMM as a model with a given number of states. You start at a specific state with a given probability and transition from one state to another with a given probability. While you are in a state you emite symbols with a given probability. During this process you generate your sequential data.

*Example:
This example is a classical one so no credits for me. It is called "the occasionally dishonest casino". Your are in a casino playing a game. A dice is tossed and everytime the dice shows a six you win if not you lose. Surely the casino wants to maximize its profit so it sometimes uses a loaded dice with a lower probability of showing a six.
The loaded and the fair dice represent two states (with probablities of emiting numbers 1-6). The casino switches dices with a given transition probability.*

What you observ is the outcome of the dice. What you do not observer is "which dice was used" (the state). Thats why the sequence of states during the process is called hidden.

Back to your post:

Your second assumptions need to be more specific:

**Markov property:**
The markov property states, that a process is "memory-less". This statement is often described wrong. Most times people say something like: "the state you are in only depends on the state you were one step before". THAT IS WRONG. The correct statement would be: "the state you are in does not depend on all its precursors given the very last state you were before you entered your current state". (Sorry for this sentence but people smarted then me tried to formulate this difficult statement and failed). It is often stated that markov models with lower order can not model higher dependencies. That is also wrong.

**Stochastic:** A process is stochastic if its behavior is ruled by some kind of randomness. In a HMM this would be: Your data can be generated be multiple state-sequences. Every different state-sequence has a distinct probability. Often times your data can be generated by every state-sequence so the probabilities some up to one.

**Algorithms often used with HMMs:**

- Viterbi-Algorith gives you the most probable state-sequence that could have emitted your data.
- Forward-Backward-Algorithm computes the likelihood of your data given a HMM
- Baum-Welch-Algorithm gives your the parameters of the model that maximizes the likelihood.

For a good introduction to HMMs see: BSA

HMM can be used for so many things. What specific application are you talking about? HMMER?

@lh3 I was speaking for the application of DNA sequencing ... specifically de novo

What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? In the small area of sequence analysis alone, HMM can model so many things with hidden states interpreted very differently. You should give the

exact biological problemyou are thinking about.@lh3 specifically HMM application to sequencing a individual humans genome, with multiple reads where the HMM model is trained by those multiple alignments to deduce the consensus genome

What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Anyway, your question is too vague. You should really give the specific problem you want to solve. HMM can be used to model so many things and the interpretation of the hidden states varies greatly. And without the description of your problem, I am even not sure if your problem in mind can be solved by HMM in the first place. In all, please be specific.

What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Anyway, your question is too vague. You should really give the specific problem you want to solve. HMM can be used to model so many things and the interpretation of the hidden states varies greatly.

What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Anyway, your question is too vague. You should really give the specific biological problem you want to solve. HMM can be used to model so many things and the interpretation of the hidden states varies greatly.

What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Anyway, your question is too vague. You should really give the specific biological problem you want to solve. HMM can be used to model so many things and the interpretation of the hidden states varies greatly. Before you thoroughly understand HMM in a very specific application, there is no way you can understand HMM in a more general context.

What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Anyway, your question is too vague. You should really give the specific biological problem you want to solve. HMM can be used to model so many things and the interpretation of the hidden states varies greatly. There are things common to all HMMs, but before you thoroughly understand HMM in a very specific application, there is no way you can understand HMM in a more general context.

What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Your trouble is to mix abstract concepts with detailed applications, which is really confusing to me. You should

reallygive theexact biological problemyou are thinking.What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? In the small area of sequence analysis alone, HMM can model so many things with hidden states interpreted very differently. You should really give the

exact biological problemyou are thinking.What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? In the small area of sequence analysis alone, HMM can model so many things with hidden states interpreted very differently. You should really give the

you are thinking about. Your question is entirely confusing to me. Sorry.exact biological problem@delinquentme I'd second the suggestion to be more specific about the application of HMM you are interested in understanding. Typically you are not using an HMM to predict nucleotides. As you point out we're observing the nucleotides from our sequencing experiments. The goal of a HMM to predictions about some biological property from the observed sequence. For example we observe a sequence, and we want to known where the genes are. The observed data are the nucleotide labels, and the hidden property is "in gene", "not in gene".

@deliquentme, are thinking of using HMM to align sequences? In that case the nucleotide labels in the sequences are the observed data, and the hidden states are whether that position represents an insertion, a deletion, or a substitution.

@charles ... but wouldn't that be deduced from getting multiple coverage .. and simply figuring out which is the most statistically probable sequence?

If you are thinking to infer a consensus without gaps, I do not see the point of using HMM. Most simplistic methods will work sufficiently well. HMM becomes really powerful when you start to deal with gaps, but my impression is you have not been prepared for such complexity. Read my BAQ paper [PMID:21320865]. Not the same, but very relevant.

If you are thinking to infer a consensus without gaps, I do not see the point of using HMM. Most simplistic methods will work sufficiently well. HMM becomes really powerful when you start to deal with gaps, but my impression is you have not been prepared for such complexity. Read my BAQ paper [PMID:21320865]. Not the same, but relevant if you think in the right way.

Do you thoroughly understand the few simple examples in Richard Durbin's "Biological sequence analysis"? If not, understand those examples first and then revisit your own questions.

Do you thoroughly understand the few simple examples in Richard Durbin's "Biological sequence analysis"? If not, understand those examples first and then revisit your own questions.

@delinquentme, as ih3 says you don't need an HMM if you are just piling up reads and throwing out ones with mis-matches. On the other had HMM are useful if want to start worrying about whether a mis-match between an assembled sequence and a reference genome is a SNP or a sequencing error. Is their some particular program or paper you are trying to understand? HMM are used to solve all sorts of problems from speech recognition, to sequence alignment, to gene finding, to protein structure determination, the details and the vocabulary vary from problem to problem.

@delinquentme, with respect to question one: if you are talking about sequence alignment, the HMM isn't concerned with the different states of the sequence over evolutionary history, except for the assumption that the sequences have a common ancestor. Rather we're looking at how states change as we move from left to right in the sequence. Suppose we are doing sequence alignment with gaps. We can't observe gaps, we can only infer them. Roughly speaking the probability that a position is a gap depends primarily on whether or not the position immediately to its left is a gap.

@Charles is there a succinct text on sequencing, gaps additions and deletions? Im a programmer whos pretty new to biology... I'd love to get to wrapping my head around this more