Question: Genome de novo annotation with Maker
gravatar for alslonik
11 months ago by
alslonik110 wrote:

Hi all,

I am annotating my new plant genome now and am working with Maker and its very detailed tutorial ( . I have read a few really helpful posts about Maker here as well, but i still have some questions.

  1. SNAP training. How do you actually know that it is enough to train and you can run your final Maker run? I have tried to run it several time and there is a difference in the number of genes every time. It is actually a kind of sinusoidal graph - number of genes are going up and down... So when do you stop? Or how do you know that SNAP is trained? Do you wait until the plateau? How many times did you do the training and why?

  2. My genome has unusually high repeat content. This is why I decided to create its own repeat library with repeatModeler. The question is where on the option file do I add this repeat library?

THANKS a lot for your help,


maker snap genome annotation • 650 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by alslonik110
gravatar for jean.elbers
11 months ago by
jean.elbers1.3k wrote:

You can specify a custom repeat library (in FASTA format) with rmlib in the Repeat Masking section of the make_opts.ctl file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=all #select a model organism for RepBase masking in RepeatMasker
rmlib=repeatlibrary.fa #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein=/opt/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)`

This is an example Repeat Masking section

You might also consider running ProtExcluder on the output of RepeatModeler

# Run blastx then ProtExcluder to excluce known protein sequences from RepeatModeler library
/usr/bin/blastx -num_threads 75 -db /genetics/elbers/maker/uniprot_sprot.fasta -evalue 1e-6 \
-query repeatlibrary.fa -out repeatlibrary.fa.blast

/opt/ProtExcluder1.1/ -f 50 repeatlibrary.fa.blast repeatlibrary.fa
# output of ProtExcluder is "temp"
# rename temp to whatever you desire
mv temp repeatlibrary.fa2
ADD COMMENTlink modified 11 months ago • written 11 months ago by jean.elbers1.3k

Many thanks for your help, Jean,

I am following your advice and excluding the protein sequences. The question is now- which protein db did you use? Only uniprot? Or combined with refseq? Isn't it redundant? Do you exclude the transposon sequences as it is pointed out in the Maker wiki? Do you do it by alignment to the transposon library? It sounds like a really simple step, but somehow I am stuck all the way...

The library that is provided in the manual is old (2011) and also appears to be corrupt...

ADD REPLYlink written 11 months ago by alslonik110

I would use the most up-to-date Swiss-Prot database


and not worry about combining RefSeq or transposon sequences. Someone with more experience might have better advice to give, but I think this is sufficient.

ADD REPLYlink modified 11 months ago • written 11 months ago by jean.elbers1.3k

Got you. Thanks again!

ADD REPLYlink written 11 months ago by alslonik110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 770 users visited in the last hour