Question: Augustus Training Time
gravatar for Daniel Standage
8.5 years ago by
Daniel Standage4.0k
Davis, California, USA
Daniel Standage4.0k wrote:

I am working on assembling and annotating the genome of a non-model organism, and I have a set of about 3k genes from this genome that I am using to train my ab initio gene predictors. For Augustus, I am following the training procedure documented on this page. I converted the data to GenBank format and split the data into a training set and a test set, each containing 1.5k annotated sequences. After making the appropriate parameter/config files for this species, I launched the script with the 1.5k training sequences.

The page includes the caveat that this script likely has to run overnight. However, it has been going for over 2 days now and shows no sign of stopping. I'm guessing this is this taking so long because of the number of training sequences I have--the documentation recommends about 200 genes, whereas I have nearly 10 times that. Is this intuition correct? What runtimes have you had when training Augustus?

ADD COMMENTlink modified 6.3 years ago by aaarsh880 • written 8.5 years ago by Daniel Standage4.0k
gravatar for Sujai Kumar
8.5 years ago by
Sujai Kumar240
United Kingdom
Sujai Kumar240 wrote:

Try the script that comes with Augustus 2.5.5 in scripts: --singleCPU --useexisting --genome=genome.fasta --species=speciesname --cdna=EST.fasta --trainingset=genome.gff3

We get the genome.gff3 training set from the output of a first-pass run of MAKER using: 1. EST data (if available, same file as above) 2. Proteins from related species 3. a SNAP model trained using CEGMA 4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) 5. Running maker2zff on the output of MAKER, and converting that to GFF3 (Carson Holt's scripts are brilliant - this one ensures that it only picks up high(er) quality models from the prediction set

Yes, it takes a while. Two days sounds about right in singleCPU mode for a 100-200 Mb metazoan.

Once done, we run MAKER a second time using the Augustus model and more stringent settings.

Let me know if you need more details on any of these steps.

ADD COMMENTlink written 8.5 years ago by Sujai Kumar240

I'm running MAKER on a non model organism as well where I only have alternative est data and no est data from the actual species. I was wondering then when running this script whether I should omit the --cdna flag or use the est data from the alternative species?

ADD REPLYlink written 4.7 years ago by pernille.nilsson0
gravatar for aaarsh88
6.3 years ago by
United States
aaarsh880 wrote:

hi ,

I am working Oryza sativa genome with genome size around 380Mb i have run augustus retraining since three week before still its under process,

Will you please let me know how long it will run ?

If possible suggest some multithreading option to integrate in its training step to get it done asap..




ADD COMMENTlink written 6.3 years ago by aaarsh880

Hello, I met the same problem that the training is still running for about 2weeks. Do you solve yours? 

ADD REPLYlink written 6.1 years ago by zy04122570

As this is a separate question, it should have been posted as a new thread.

The only way to speed things up is to configure maker using MPI. It takes me about 6 days on 16 processors to finish one round on a ~150,000 scaffold ~2Gigabase vertebrate genome with protein evidence.

ADD REPLYlink written 5.7 years ago by mtollis30

Dear there,

I'm using, with a training set of 1000 genes and the parameter -cpus=20, on a 650M genome, and for 5 rounds (default). One week have pass, all augustus processes have stopped except only one left on running with no sign to stop, and the nohup file really have gain no more information now.

What happened to your work afterwards? Can I share your experience here? Thanks a lot.

ADD REPLYlink written 4.0 years ago by dukecomeback40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1397 users visited in the last hour