Question: Genome de novo annotation with Maker
3
gravatar for alslonik
5 months ago by
alslonik80
Israel
alslonik80 wrote:

Hi all,

I am annotating my new plant genome now and am working with Maker and its very detailed tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018) . I have read a few really helpful posts about Maker here as well, but i still have some questions.

  1. SNAP training. How do you actually know that it is enough to train and you can run your final Maker run? I have tried to run it several time and there is a difference in the number of genes every time. It is actually a kind of sinusoidal graph - number of genes are going up and down... So when do you stop? Or how do you know that SNAP is trained? Do you wait until the plateau? How many times did you do the training and why?

  2. My genome has unusually high repeat content. This is why I decided to create its own repeat library with repeatModeler. The question is where on the option file do I add this repeat library?

THANKS a lot for your help,

Alex

maker snap genome annotation • 356 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by alslonik80
2
gravatar for jean.elbers
5 months ago by
jean.elbers970
jean.elbers970 wrote:

You can specify a custom repeat library (in FASTA format) with rmlib in the Repeat Masking section of the make_opts.ctl file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=all #select a model organism for RepBase masking in RepeatMasker
rmlib=repeatlibrary.fa #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein=/opt/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)`

This is an example Repeat Masking section

You might also consider running ProtExcluder on the output of RepeatModeler

http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Basic

# Run blastx then ProtExcluder to excluce known protein sequences from RepeatModeler library
/usr/bin/blastx -num_threads 75 -db /genetics/elbers/maker/uniprot_sprot.fasta -evalue 1e-6 \
-query repeatlibrary.fa -out repeatlibrary.fa.blast

/opt/ProtExcluder1.1/ProtExcluder.pl -f 50 repeatlibrary.fa.blast repeatlibrary.fa
# output of ProtExcluder is "temp"
# rename temp to whatever you desire
mv temp repeatlibrary.fa2
ADD COMMENTlink modified 5 months ago • written 5 months ago by jean.elbers970

Many thanks for your help, Jean,

I am following your advice and excluding the protein sequences. The question is now- which protein db did you use? Only uniprot? Or combined with refseq? Isn't it redundant? Do you exclude the transposon sequences as it is pointed out in the Maker wiki? Do you do it by alignment to the transposon library? It sounds like a really simple step, but somehow I am stuck all the way...

The library that is provided in the manual is old (2011) and also appears to be corrupt...

ADD REPLYlink written 5 months ago by alslonik80
1

I would use the most up-to-date Swiss-Prot database

wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz

and not worry about combining RefSeq or transposon sequences. Someone with more experience might have better advice to give, but I think this is sufficient.

ADD REPLYlink modified 5 months ago • written 5 months ago by jean.elbers970

Got you. Thanks again!

ADD REPLYlink written 5 months ago by alslonik80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1117 users visited in the last hour