Question: Non-Coding Markov Model In Prokaryotic Gene Prediction
1
gravatar for aniket.schneider
7.4 years ago by
aniket.schneider10 wrote:

I'm building a very basic Markov Model-based prokaryotic gene finder for a class project, and I have been reading some literature about GLIMMER for guidance. If I have understood the basic algorithm correctly, GLIMMER scores a given ORF in all six reading frames, normalizes the six scores so that they represent a probability that the ORF is a gene, and then predicts a gene if the ORF scores above a certain threshold in the correct reading frame (with some filtering for overlaps after this). I have two questions that I hope someone more familiar with these types of algorithms can give me some guidance with.

First, they mention earlier in the paper that intuitively one would want to have a seventh model for non-coding regions, but that this is "not strictly necessary". I'm not sure I understand why this isn't necessary. I imagine a situation where an ORF scores very poorly in all six reading frames, but the normalization makes the correct reading frame stand out, so it appears to be a gene. Wouldn't you need a non-coding model as a reference point?

Second, and probably related, how does one actually do this normalization? Is it as simple as just scaling the six scores so that they add up to 1.0? Or is there a more general way of normalizing the score from a Markov Model that accounts for the length of the sequence?

Please point out any egregious misunderstandings, as I am only just beginning to study these methods.

• 1.5k views
ADD COMMENTlink modified 14 months ago by Biostar ♦♦ 20 • written 7.4 years ago by aniket.schneider10
0
gravatar for Niek De Klein
7.4 years ago by
Niek De Klein2.5k
Netherlands
Niek De Klein2.5k wrote:

If all 6 of the reading frames score poorly normalization won't make one of the scores go over the threshold. The model for non-coding is not strictly necessary because if all 6 regions score low, by virtue of it not being a coding region, it is be a non-coding region. There are different methods of normalization, did they not put their method in the paper?

ADD COMMENTlink written 7.4 years ago by Niek De Klein2.5k

The paper didn't describe the normalization method. I'm trying to sort through the source code for it, but I haven't had much luck yet.

ADD REPLYlink written 7.4 years ago by aniket.schneider10

what is the name of the paper?

ADD REPLYlink written 7.4 years ago by Niek De Klein2.5k

Microbial gene identification using interpolated Markov models

ADD REPLYlink modified 7.4 years ago • written 7.4 years ago by aniket.schneider10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2400 users visited in the last hour