Question: Augustus result interpretation
gravatar for bioinformaticssrm2011
5.0 years ago by
bioinformaticssrm201190 wrote:


I am new to augutus and i used Augustus for fungal genome analysis.

i used command :

augustus --species=human --UTR=on sequence.fasta > sequence_augustus.gff


i got result

 ----- prediction on sequence number 1 (length = 11239, name = contig00001) -----
# Constraints/Hints:
# Predicted genes for sequence number 1 on both strands
# start gene g1
contig00001    AUGUSTUS    gene    1476    4367    1    +    .    g1
contig00001    AUGUSTUS    transcript    1476    4367    .    +    .    g1.t1
contig00001    AUGUSTUS    tss    1476    1476    .    +    .    transcript_id "g1.t1"; gene_id "g1";
contig00001    AUGUSTUS    exon    1476    1559    .    +    .    transcript_id "g1.t1"; gene_id "g1";
contig00001    AUGUSTUS    exon    2030    4367    .    +    .    transcript_id "g1.t1"; gene_id "g1";
contig00001    AUGUSTUS    start_codon    2378    2380    .    +    0    transcript_id "g1.t1"; gene_id "g1";
contig00001    AUGUSTUS    CDS    2378    3223    .    +    0    transcript_id "g1.t1"; gene_id "g1";
contig00001    AUGUSTUS    stop_codon    3221    3223    .    +    0    transcript_id "g1.t1"; gene_id "g1";
contig00001    AUGUSTUS    tts    4367    4367    .    +    .    transcript_id "g1.t1"; gene_id "g1";
# end gene g1


my question what above result indicates ? and what about protein sequence ? if this protein correspond to the first Nucleotide sequence (contig 1) in my complete fasta file (contains many contigs), should i use this protein sequence and do the BATCH CD search for annotation ? or any other tools for annotation for fungal genmone ?

Any suggestions.

Regards !



sequencing sequence gene genome • 4.6k views
ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by bioinformaticssrm201190
gravatar for Michael Dondrup
5.0 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

The result is in gff 2 format. There are additional comment lines with predicted protein sequence. The gff file describes the predicted gene models. The amino acids sequence is the translation of the predicted coding sequence of the first predicted gene on contig 1 (between start codon (included as 'M') and stop codon). You can use BlastP vs NR to for a quick search.

However, your output is most likely bogus and cannot be used, because you used human training data (at least that is what -species=human switch indicates), but your genome is fungal. This doesn't work. For predicting eukaryote genes, you need appropriate training data from your organism or closely related organisms, e.g. RNA-seq data, full length cDNA, related organism's protein sequences, etc. 

 If you blast your predicted protein sequence from the example, one gets only very weak hits, none significant, of course this might be an exception. You could check all predictions like that if you don't believe me about the importance of training data; a very large proportion of predicted AA might not have significant hits, indicating that the prediction is not good.

If you want a state-of-the-art gene prediction, you should look at pipelines like MAKER, which include several tools, like Augustus, Snap, integrate evidence, proper repeat masking, and re-training.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Michael Dondrup47k

Thank you for the information about Augustus.

Actually my purpose of doing this is to annotate the fungal genome. I am not bioinformatician, i have seen MAKER, it requires many dependencies (somehow i am unable to make it work). 

Is there any server and easy software where I can annotate fungal genome ?



ADD REPLYlink written 5.0 years ago by bioinformaticssrm201190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 689 users visited in the last hour