Question

How to train annotations tools

0

Entering edit mode

3.0 years ago

Ric ▴ 440

I would like to train Augustus, SNAP and GlimmerHMM. I found protein sequences in GenBank and in orthodb.org. Furthermore, I found HMM files on busco-data.ezlab.org.

$ wget -c https://busco-data.ezlab.org/v5/data/lineages/viridiplantae_odb10.2020-09-10.tar.gz
$ wget -c https://v100.orthodb.org/download/odb10_plants_fasta.tar.gz

Are there any instructions on how to train those annotations tools?

Thank you in advance

genome annotation gene • 975 views

ADD COMMENT • link updated 3.0 years ago by Philipp Bayer 8.7k • written 3.0 years ago by Ric ▴ 440

score 1 · Answer 1 · 2021-10-24

You can train AUGUSTUS using BRAKER2 and your proteins, that should be a bit more accurate than the BUSCO output: https://academic.oup.com/nargab/article/3/1/lqaa108/6066535

If you don't want to run that, there's code on biostars to use BUSCO for Augustus training: Augustus gene prediction for non model organism

There's also a bit of code around biostars on training SNAP using BUSCO output, have a look for example here: convert BUSCO gff files to SNAP HMMs

GlimmerHMM I've never used so I can't evaluate the biostars posts.