Question: Training Set With Augustus
gravatar for 璐
7.2 years ago by
10 wrote:

Hi, I am working on annotation of plant genome recently. I choose the AUGUSTUS to predict genes. I see the document of training sets.But I can't understand it.

Firstly, the protocol of "retraining AUGUSTUS" needs a training set,a test set and A META PARAMETERS. Are the training set or the test set completely sequeces? How can I get it ? From NCBI? And How can I configure the file(*.cfg) in META PARAMETERS?

Secondly, The file hints, How does it come from or generate?

Does the "retraining AUGUSTUS" and the hints have some relationship between them?

Does the two web sites pointed to the same thing?

Hope for reply!

ADD COMMENTlink written 7.2 years ago by 10
gravatar for David R. Powell
6.9 years ago by
Melbourne, Australia
David R. Powell10 wrote:

The training set is a file of genes in genbank format to use for training. The test set is also a file of genes in genbank format that you may use to assess the quality of the training. The meta parameters are various parameters used by AUGUSTUS for prediction.

You must choose your own training and test set of genes. The "retraining AUGUSTUS" page suggests a number of possible sources:

  • Genbank
  • Spliced alignments of ESTs against the assembled genomic sequence. e.g. PASA
  • Spliced alignments of protein sequences of a related species against the assembled genomic sequence, e.g. GeneWise
  • Data from a related species
  • Iterate retraining with predicted genes

The meta parameters should be based on the generic ones that come with AUGUSTUS in generic_parameters.cfg and generic_weightmatrix.cfg

The first link you provide ( describes how to perform the training of AUGUSTUS manually. The second link ( describes another program,, that can automate this training for you.

Training AUGUSTUS can seem intimidating at first, but if you follow the retraining document it is reasonably straightforward. In particular, the steps in the section 3. RUN THE SCRIPT are easy to follow.

ADD COMMENTlink written 6.9 years ago by David R. Powell10

Dear David,

I'm using, with a training set of 1000 genes and the parameter -cpus=20, on a 650M genome, and for 5 rounds (default). One week have pass, all augustus processes have stopped except only one left on running with no sign to stop, and the nohup file really have gain no more information now.

It's quite a dilemma to me now, can you give me some advice. Thanks.

         Du Kang
ADD REPLYlink written 2.8 years ago by dukecomeback40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 768 users visited in the last hour