I suppose that in the database searching for homologous sequence using the profiles created by whether MEME (represented by a PWM) or a HMM (represented by some profile HMM) make use of the log odds ratio with the random model being an independent process with the emission probability being the discrete uniform distribution. If so, then what's the point in those probabilistic model estimating the emission probability of the background in their training?
Or, should I say the valuable part of MEME and HMM is that they make use of expectation maximization so that the conditional expected likelihood in each iteration the likelihood is monotonically increasing (as opposed to some heuristic method)?
I also noticed that most of the HMM packages (e.g. HMMER3 instead of HMMER2) mostly use multiple sequence alignment to train the HMM instead of using Baum-Welch. Doing so of course, avoid estimating the background probability.
If anyone could provide a big picture I'd be appreciated.
Yes. I am guessing that's the reason why HMMER3 does not include a Baum-Welch training procedure to train a profile HMM, since eventually the estimated background distribution during the training is going to be "discarded" anyway when performing a homology search in a database?
I don't know the exact answer as I didn't go through the HMMer code, but I don't think that global background frequencies are discarded during search.
These papers may have the answer you are looking for:
https://www.ncbi.nlm.nih.gov/pubmed/20180275
https://www.ncbi.nlm.nih.gov/pubmed/22039361