I'm often playing with draft genomes of non-model species (mostly in fishes) and we need to annotate these genomes. In cases like this, we do not really care about putative proteins that are based on ORFs or any ab-initio methods.
What we really need is to get a GFF3 annotation file listing known proteins (from swissprot, for example) with an accompanying .csv file that gives more informations about the proteins (scaffold, position, protein name, etc).
What would the simplest approach be to achieve that goal while treating intron/exons properly and producing annotations like (gene, cds, exon, utr...)?
Right now, I am considering a workflow like this:
- Repeat Masker
- EVidence Modeler (EVM)
And skipping anything to do with ab-initio detection (augustus, exonerate...)
Am I missing a simpler approach? The approach needs to work for eukaryote genomes (~1-3 Gbp).
EDIT: Ah well... Please do not suggest MAKER 1 or 2. I am not going to use MAKER unless my actual survival depends on it ;)