Glimmer output file

0

Entering edit mode

7.2 years ago

bitpir ▴ 250

Hi,

I am not quite understanding the output of the .predict file from Glimmer3.02 ORF predictor.

Here's a sample of the output file

ref|NC_023013.1| Haloarcula hispanica N601 chromosome 1, complete sequence
orf00001        1     1575  +1    18.88
orf00003     2355     1645  -1    12.87

According to the documentation, column1= ID, column2=start of gene, column3= stop of gene, column4=reading frame, column5=The per-base “raw” score of the gene.

My questions are:

to calculate the ORF score (100*log-odd ratio) of the gene, do I multiply column5 by the length of the gene?
Is there a good threshold (either for column 5 or the calculated score) to see if the predicted ORF is likely to be true?

Thanks for the help!

ORF prediction Glimmer3 • 2.5k views

ADD COMMENT • link 7.2 years ago by bitpir ▴ 250

0

Entering edit mode

My question is: 1. 2.

I see more than one question :-)

ADD REPLY • link 7.2 years ago by Ram 45k

0

Entering edit mode

Good catch! Thought of another question but forgot to change the grammar! :)

ADD REPLY • link 7.2 years ago by bitpir ▴ 250

0

Entering edit mode

a bit pragmatic maybe, but all really depends on how you run glimmer.

Plain glimmer3 predictions often are an underprediction and don't get the start codon right. the included iterative workflow creates a first model, determines a PWM on the most likely Shine-Dalgarno site and a better estimate of the start codon distribution and reruns glimmer using this information. The resulting gene model is far more accurate than the initial one.

ADD REPLY • link 7.2 years ago by Carambakaracho ★ 3.3k

0

Entering edit mode

Thanks for the info. About the iterative workflow, I often run into a problem of generating PWM. It works for some files but not others. Wonder if this is common?

ADD REPLY • link 7.2 years ago by bitpir ▴ 250

Login before adding your answer.