I'm trying to understand some of the details behind how a profile HMM (like the one implemented in HMMER) works. What exactly are the 'hidden' states in the model and how should they be interpreted? For example, if I calculate the most probable sequence of states given some observable protein sequence and a trained model, what exactly does that sequence of states represent?
In an HMM, the hidden states model the process by which a sequence of observation is generated. In a profile HMM, the states corresponds to positions in a multiple alignment and can be of three types: match, delete or insert. Maybe this paper can help your understanding. The most probable sequence of states corresponds to the most probable path through the hidden states of the model that would produce the observed sequence.
EDIT: Forgot the obligatory reference to the Biological sequence analysis book (aka the Durbin book), chapter 3.