Question: Training Augustus quality
gravatar for Nastasiia
2.2 years ago by
Nastasiia 10
Nastasiia 10 wrote:

Hi, did anyone trained augustus? I'm actually interested in training quality which you can estimate on test gene set. My pipeline in R for choosing training set is (I use gff from genbank): 1) In gff choose all the genes for which product is defined. NO hypothetical or predicted proteins. 2)remove all the alternative transcripts. 3) remove exon-less genes 4) check mRNA overlaps ( adding 1000 flanks) and get rid of overlapping genes 5) eventually I've decided to choose genes with annotated UTRs ( just >30 bp) as I've got better results with it. -UTRs I create in gff by myself

Resulting gff table with ~500 genes, CDS and UTR features, I turn to gb with augustus script, split it on ~350 train and test set. After etraining and checking on test set the best result I've got for gene prediction is about 0.5 Optimizing doesn't help a lot In tutorial it was suggested in bug_parameters.cfg turn "excludestopocodon..." to TRUE. Which in my case makes training quality even worse.

So main questions is what gene/ exon/UTR prediction qualities you get? Should they be so low? Do you see some fail in my pipeline and what are your suggestions about it?



ADD COMMENTlink modified 2.1 years ago by Biostar ♦♦ 20 • written 2.2 years ago by Nastasiia 10

What type of genome are you annotating ? One of the important parameter is to remove redundancies among your gene set, so you should check that within your set you haven't any gene that share more than ~85% of similarity with another one. It could biais your training. I have made a test on the optimal number of gene to get a good Augustus training, and it was between 500 and 750. So your number of gene is a bit low...

ADD REPLYlink written 2.1 years ago by Juke-342.0k

Thanks. Dyatom, training on Fragilariopsis cylindrus genome. Yes I did not check redundancy, should try this

ADD REPLYlink written 2.1 years ago by Nastasiia 10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2210 users visited in the last hour