Question: What's the point in estimating the background distribution in probabilstic models of motif finding such as MEME (mixture model with EM) or some HMM?
gravatar for guixien
15 months ago by
guixien0 wrote:

I suppose that in the database searching for homologous sequence using the profiles created by whether MEME (represented by a PWM) or a HMM (represented by some profile HMM) make use of the log odds ratio with the random model being an independent process with the emission probability being the discrete uniform distribution. If so, then what's the point in those probabilistic model estimating the emission probability of the background in their training?

Or, should I say the valuable part of MEME and HMM is that they make use of expectation maximization so that the conditional expected likelihood in each iteration the likelihood is monotonically increasing (as opposed to some heuristic method)?

I also noticed that most of the HMM packages (e.g. HMMER3 instead of HMMER2) mostly use multiple sequence alignment to train the HMM instead of using Baum-Welch. Doing so of course, avoid estimating the background probability.

If anyone could provide a big picture I'd be appreciated.

ADD COMMENTlink modified 15 months ago by Mensur Dlakic8.1k • written 15 months ago by guixien0
gravatar for Mensur Dlakic
15 months ago by
Mensur Dlakic8.1k
Mensur Dlakic8.1k wrote:

Background distribution of residues will affect the scoring.

Score = Sum [Pb(i) * (log2(Pb(i)/P0b)]

where b=A,C,G,T (for DNA), Pi is the residue at a given position, and P0b are residue background frequencies. If a sampled frequency of A is 0.35, the motif scanning score will be much different when background frequency of A is 0.25 compared to 0.35. The same is true for HMMs, although most HMMs use AA frequencies estimated from large protein databases rather than proteins of individual organisms or a group of related organisms.

ADD COMMENTlink written 15 months ago by Mensur Dlakic8.1k

Yes. I am guessing that's the reason why HMMER3 does not include a Baum-Welch training procedure to train a profile HMM, since eventually the estimated background distribution during the training is going to be "discarded" anyway when performing a homology search in a database?

ADD REPLYlink written 15 months ago by guixien0

I don't know the exact answer as I didn't go through the HMMer code, but I don't think that global background frequencies are discarded during search.

These papers may have the answer you are looking for:

ADD REPLYlink written 15 months ago by Mensur Dlakic8.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1007 users visited in the last hour