Non-Coding Markov Model In Prokaryotic Gene Prediction
1
1
Entering edit mode
12.0 years ago

I'm building a very basic Markov Model-based prokaryotic gene finder for a class project, and I have been reading some literature about GLIMMER for guidance. If I have understood the basic algorithm correctly, GLIMMER scores a given ORF in all six reading frames, normalizes the six scores so that they represent a probability that the ORF is a gene, and then predicts a gene if the ORF scores above a certain threshold in the correct reading frame (with some filtering for overlaps after this). I have two questions that I hope someone more familiar with these types of algorithms can give me some guidance with.

First, they mention earlier in the paper that intuitively one would want to have a seventh model for non-coding regions, but that this is "not strictly necessary". I'm not sure I understand why this isn't necessary. I imagine a situation where an ORF scores very poorly in all six reading frames, but the normalization makes the correct reading frame stand out, so it appears to be a gene. Wouldn't you need a non-coding model as a reference point?

Second, and probably related, how does one actually do this normalization? Is it as simple as just scaling the six scores so that they add up to 1.0? Or is there a more general way of normalizing the score from a Markov Model that accounts for the length of the sequence?

Please point out any egregious misunderstandings, as I am only just beginning to study these methods.

• 2.2k views
ADD COMMENT
0
Entering edit mode
12.0 years ago
Niek De Klein ★ 2.6k

If all 6 of the reading frames score poorly normalization won't make one of the scores go over the threshold. The model for non-coding is not strictly necessary because if all 6 regions score low, by virtue of it not being a coding region, it is be a non-coding region. There are different methods of normalization, did they not put their method in the paper?

ADD COMMENT
0
Entering edit mode

The paper didn't describe the normalization method. I'm trying to sort through the source code for it, but I haven't had much luck yet.

ADD REPLY
0
Entering edit mode

what is the name of the paper?

ADD REPLY

Login before adding your answer.

Traffic: 2510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6