Question: what edits to make on maker_opts to predict complete genes using MAKER
gravatar for ngs_new_user
3.1 years ago by
ngs_new_user0 wrote:

Hi everyone, I am in the process of predicting genes from a non-model organism using the gene predicting software MAKER. I performed a first run of prediction using default parameters and noticed that some of the predicted genes are partially predicted (ie missing either the start or stop codon). I would like to predict only complete genes (ie genes with both start and stop codons).

My question is, what part of the .ctl files should I edit so as to tell MAKER to only predict complete genes. I came across the option below in the maker_opts.ctl file:

always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no

If I change the value from 0 to 1, would it force the partially predicted genes to have the start and stop codons even if its wrong, introducing errors? Thanks in advance

ADD COMMENTlink modified 3.1 years ago by Philipp Bayer6.7k • written 3.1 years ago by ngs_new_user0
gravatar for Philipp Bayer
3.1 years ago by
Philipp Bayer6.7k
Philipp Bayer6.7k wrote:

This is the code that is triggered:

#walk out edges to force completion
if($CTL_OPT->{always_complete} && (!$has_start || !$has_stop)){    
   $f = PhatHit_utils::adjust_start_stop($f, $seq);
   $transcript_seq  = get_transcript_seq($f, $seq);
   ($translation_seq, $offset, $end, $has_start, $has_stop) =  get_translation_seq($transcript_seq, $f);

That method lives in lib/ (line 796) and is too long to quote. Looking at it it looks like all it does is walk upstream until it finds an M/ATG and downstream until it finds a stop codon. It stops looking if the contig ends or if there are Ns.

That will give you complete genes, but it's questionable whether the beginning and end of your genes is 'real'. If your input data is bad (let's say low coverage ESTs) then I'd assume you get more incomplete genes due to missing data than when run with high coverage RNASeq, where your incomplete genes may have other causes (perhaps you have two copies and one is an incomplete pseudogene where RNASeq from the complete copy aligns?)

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Philipp Bayer6.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1734 users visited in the last hour