Question

Training Augustus quality

1

Entering edit mode

7.2 years ago

Nastasiia ▴ 10

Hi, did anyone trained augustus? I'm actually interested in training quality which you can estimate on test gene set. My pipeline in R for choosing training set is (I use gff from genbank): 1) In gff choose all the genes for which product is defined. NO hypothetical or predicted proteins. 2)remove all the alternative transcripts. 3) remove exon-less genes 4) check mRNA overlaps ( adding 1000 flanks) and get rid of overlapping genes 5) eventually I've decided to choose genes with annotated UTRs ( just >30 bp) as I've got better results with it. -UTRs I create in gff by myself

Resulting gff table with ~500 genes, CDS and UTR features, I turn to gb with augustus script, split it on ~350 train and test set. After etraining and checking on test set the best result I've got for gene prediction is about 0.5 Optimizing doesn't help a lot In tutorial it was suggested in bug_parameters.cfg turn "excludestopocodon..." to TRUE. Which in my case makes training quality even worse.

So main questions is what gene/ exon/UTR prediction qualities you get? Should they be so low? Do you see some fail in my pipeline and what are your suggestions about it?

Thanks!

Nastya

augustus annotation genome_annotation • 2.4k views

ADD COMMENT • link updated 7.2 years ago by Biostar 20 • written 7.2 years ago by Nastasiia ▴ 10

0

Entering edit mode

What type of genome are you annotating ? One of the important parameter is to remove redundancies among your gene set, so you should check that within your set you haven't any gene that share more than ~85% of similarity with another one. It could biais your training. I have made a test on the optimal number of gene to get a good Augustus training, and it was between 500 and 750. So your number of gene is a bit low...

ADD REPLY • link 7.2 years ago by Juke34 8.5k

0

Entering edit mode

Thanks. Dyatom, training on Fragilariopsis cylindrus genome. Yes I did not check redundancy, should try this

ADD REPLY • link 7.2 years ago by Nastasiia ▴ 10