Question: How does MAKER decide which proteins go into the final output?
1
gravatar for Philipp Bayer
3.5 years ago by
Philipp Bayer5.8k
Australia/Perth/UWA
Philipp Bayer5.8k wrote:

After a MAKER run with 3 ab initio predictors and using fasta_merge -d on the resulting log file, I get 4 output files - one for each ab initio annotator, and one called "Genome.maker.proteins.fasta" which looks like the "union" of the three ab initio predictors. However, at least one of the ab initio annotation programs output has many more proteins than the final "Genome.maker.proteins.fasta" output.

I first thought it's just proteins with AED != 1 in the final output but proteins with AED=1 are still abundant. Other filtering flags like min_protein etc. are set to 0, so it doesn't filter these out as well (standard maker_opts.ctl). It looks like it filtered relatively short proteins (<10AA) from my ab initio predictions, but there's no indication about this in my options.

I can't find anything on this in the devel lists or the wiki, is there any other filtering step done by MAKER I'm not seeing right now?

maker annotation • 1.5k views
ADD COMMENTlink modified 3.5 years ago by Lesley Sitter450 • written 3.5 years ago by Philipp Bayer5.8k
2
gravatar for Lesley Sitter
3.5 years ago by
Lesley Sitter450
Netherlands
Lesley Sitter450 wrote:

Have you viewed those ab initio predictors as a separate track in something like IGV and compared them to the final gene model track? If you are only looking at number maybe your not getting the entire picture. The reason you might have fewer final proteins than you have ab initio predictions is because maker tries to create a consensus gene model based on all the evidence so multiple smaller evidence models can still result in one final gene model. 

Did you use the option always_complete=1? Maybe if the ab initio model does not contain a start / stop codon it might be discarded in the final product.

It could also be that MAKER had conflicting evidence for some models, for example if three tracks that predict formation A and one track that wants another formation... MAKER will than pick the best gene model and that will be the one that has the evidence.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Lesley Sitter450

These are some good ideas!

>Did you use the option always_complete=1? Maybe if the ab initio model does not contain a start / stop codon it might be discarded in the final product.

It's always_complete=0 (and I get about 10% proteins without M, harder to check for transcripts since these contain UTRs)

>The reason you might have fewer final proteins than you have ab initio predictions is because maker tries to create a consensus gene model based on all the evidence so multiple smaller evidence models can still result in one final gene model. 

I think this is the best explanation, and that would explain why especially so many smaller GeneMark-ES models "disappeared" - they were just merged into bigger models!

ADD REPLYlink written 3.5 years ago by Philipp Bayer5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2212 users visited in the last hour