I've wrapped up an assembly and will soon be uploading a new genome and annotations to NCBI - but am having a little trouble with getting everything packaged nicely for GenBank.
I have a single .fsa and .gff3 with my genome information that I am trying to use with the table2asn_GFF tool, but am getting some errors.
Running my table2asn as so:
./linux64.table2asn_GFF -i myassembly.fsa -t mytemplate.sbt -J -c w -euk -locus-tag-prefix GQ602 -M n -Z -f myannotations.gff -outdir output_dir
I get an error regarding my protein IDs, not sure why:
FEATURE_COUNT: CDS: 7455 present FEATURE_COUNT: gene: 7455 present FEATURE_COUNT: mRNA: 7455 present FATAL: MISSING_PROTEIN_ID: 7455 proteins have invalid IDs.
A bit of my gff:
##gff-version 3 ##sequence-region scaffold_01 1 5595695 scaffold_01 FGDB gene 7249 9339 . + . ID=Ophcf2|00001|gene scaffold_01 FGDB mRNA 7249 9339 . + . ID=Ophcf2|00001;Parent=Ophcf2|00001|gene;proteinId=Ophcf2|00001;Name=Ophcf2|00001 scaffold_01 FGDB exon 7249 7255 . + . ID=Ophcf2|00001|exon1;Parent=Ophcf2|00001 scaffold_01 FGDB exon 7334 9339 . + . ID=Ophcf2|00001|exon2;Parent=Ophcf2|00001 scaffold_01 FGDB CDS 7249 7255 . + 0 ID=Ophcf2|00001|CDS;Parent=Ophcf2|00001 scaffold_01 FGDB CDS 7334 9339 . + 2 ID=Ophcf2|00001|CDS;Parent=Ophcf2|00001
Perhaps something to do with my mRNA ID and proteinId being the same?
I do plan to introduce product=* for my CDS's, but later once I can even get this first version to work.
(I've tried to poke around a bit with GAG as well, but am getting some errors I've yet to fully understand, but that's another topic)
A nudge in the right direction would be greatly appreciated, thanks!