Hi all,
I am annotating my new plant genome now and am working with Maker and its very detailed tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018) . I have read a few really helpful posts about Maker here as well, but i still have some questions.
SNAP training. How do you actually know that it is enough to train and you can run your final Maker run? I have tried to run it several time and there is a difference in the number of genes every time. It is actually a kind of sinusoidal graph - number of genes are going up and down... So when do you stop? Or how do you know that SNAP is trained? Do you wait until the plateau? How many times did you do the training and why?
My genome has unusually high repeat content. This is why I decided to create its own repeat library with repeatModeler. The question is where on the option file do I add this repeat library?
THANKS a lot for your help,
Alex
Many thanks for your help, Jean,
I am following your advice and excluding the protein sequences. The question is now- which protein db did you use? Only uniprot? Or combined with refseq? Isn't it redundant? Do you exclude the transposon sequences as it is pointed out in the Maker wiki? Do you do it by alignment to the transposon library? It sounds like a really simple step, but somehow I am stuck all the way...
The library that is provided in the manual is old (2011) and also appears to be corrupt...
I would use the most up-to-date Swiss-Prot database
and not worry about combining RefSeq or transposon sequences. Someone with more experience might have better advice to give, but I think this is sufficient.
Got you. Thanks again!
Hii.. I've been looking for ProtExcluder but, i couldn't find it out. could u please share the link to download the same?