Question: What's the point in estimating the background distribution in probabilstic models of motif finding such as MEME (mixture model with EM) or some HMM?
0
gravatar for guixien
12 days ago by
guixien0
guixien0 wrote:

I suppose that in the database searching for homologous sequence using the profiles created by whether MEME (represented by a PWM) or a HMM (represented by some profile HMM) make use of the log odds ratio with the random model being an independent process with the emission probability being the discrete uniform distribution. If so, then what's the point in those probabilistic model estimating the emission probability of the background in their training?

Or, should I say the valuable part of MEME and HMM is that they make use of expectation maximization so that the conditional expected likelihood in each iteration the likelihood is monotonically increasing (as opposed to some heuristic method)?

I also noticed that most of the HMM packages (e.g. HMMER3 instead of HMMER2) mostly use multiple sequence alignment to train the HMM instead of using Baum-Welch. Doing so of course, avoid estimating the background probability.

If anyone could provide a big picture I'd be appreciated.

ADD COMMENTlink modified 12 days ago by Mensur Dlakic1.7k • written 12 days ago by guixien0
0
gravatar for Mensur Dlakic
12 days ago by
Mensur Dlakic1.7k
USA
Mensur Dlakic1.7k wrote:

Background distribution of residues will affect the scoring.

Score = Sum [Pb(i) * (log2(Pb(i)/P0b)]

where b=A,C,G,T (for DNA), Pi is the residue at a given position, and P0b are residue background frequencies. If a sampled frequency of A is 0.35, the motif scanning score will be much different when background frequency of A is 0.25 compared to 0.35. The same is true for HMMs, although most HMMs use AA frequencies estimated from large protein databases rather than proteins of individual organisms or a group of related organisms.

ADD COMMENTlink written 12 days ago by Mensur Dlakic1.7k

Yes. I am guessing that's the reason why HMMER3 does not include a Baum-Welch training procedure to train a profile HMM, since eventually the estimated background distribution during the training is going to be "discarded" anyway when performing a homology search in a database?

ADD REPLYlink written 11 days ago by guixien0
1

I don't know the exact answer as I didn't go through the HMMer code, but I don't think that global background frequencies are discarded during search.

These papers may have the answer you are looking for:

https://www.ncbi.nlm.nih.gov/pubmed/20180275

https://www.ncbi.nlm.nih.gov/pubmed/22039361

ADD REPLYlink written 11 days ago by Mensur Dlakic1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1535 users visited in the last hour