I've submitted an annotated assembly to GenBank and after their review they found some problems I need to address. These annotations are on a draft assembly, contigs organized into scaffolds. The annotation was done using MAKER on the scaffold fasta file. The problem GenBank has pointed out is that some of my CDS fall very close to 'gaps'. Ie. my CDS are with 3bp of either the end of a scaffold, or the end of a contig within a scaffold. The contigs are defined by segments of the scaffold split by stretches of 10+ Ns.
GenBank flagged 575 of my CDS with this problem, so its too many to handle with manual fixes. I would like to have a programmatic fix but the code to handle this would be challenging for me, so before I start on a script I thought I would ask to see if anyone else may already have code for this. The annotations I need to correct are in GFF3 format, so I changes to the CDS may also require changes to parent features (exon, mRNA & gene).