What Damian describes is essentially very similar to what I'm doing with a package of scripts to parse all of the different data formats together, right now is specific for fungi (but I think I could make it more general if there is interest) - funannotate located here: https://github.com/nextgenusfs/funannotate.
Funannotate is composed of 3 main scripts: predict, annotate, and compare. It can streamline the whole process and results in NCBI ready WGS submission files. It does the following steps and converts all of the output from the different tools into proper format for downstream processing.
1) RepeatModeler de novo detection of repeats
2) RepeatMasker soft-mask repeats
3) Aligns protein evidence to genome using tblastn/exonerate (default is uniprotkb database - but can use multiple sources)
4) Aligns transcripts to genome using GMAP (can be a variety of sources, i.e. Trinity/PASA, closely related species ESTs, etc)
5) Trains/runs Augustus (if RNA-seq BAM file passed then uses BRAKER1 to train, if PASA/transdecoder GFF passed uses that, otherwise uses BUSCO prediction to train)
6) Runs GeneMark-ET/ES (GeneMark-ET via BRAKER1, otherwise self-training GeneMark-ES)
7) Evidence Modeler constructs best gene models from all predicitons (Augustus, GeneMark, PASA (optional), protein alignments, and transcript alignments.
8) tRNAscan-SE predicts tRNAs
9) Filters out transposons and "bad" gene models (internal stops, etc)
10) Make Genbank .tbl file using Genome Annotation Generator (GAG) and convert to GenBank flat-file using tbl2asn
11) Parse tbl2asn error report, removing problematic gene models
12) Finally run back through GAG -> tbl2asn
1) Assigns functional annotation using PFAM, InterPro, UniProtKB, MEROPS proteases, CAZymes, GO ontology, BUSCO models
2) Incorporates functional annotation using GAG
3) Creates GenBank submission files using tbl2asn and other scripts (.sqn, .tbl, .agp, .contigs)
1) Parses functional annotation for each genome, making graphs, plots, etc
2) Runs ProteinOrtho5 clustering tool to find orthologous groups
3) Runs RAxML to generate phylogeny from randomly chosen BUSCO single copy orthologs
4) finally outputs all the data into HTML format