Hey everyone,
I've recently used GeneMark to de novo predict genes in an assembly I'm working on. I installed GeneMark-ES / ET v.4.33 .
The run generated a GTF, however the first column is using dufferent scaffold names to my own. My scaffolds are simply labelled "scaffold1, scaffold2, scaffold3 etc.". An example of the GeneMark-ES GTF output is below:
1_dna GeneMark.hmm exon 137 283 0 + . gene_id "1_g"; transcript_id "1_t";
1_dna GeneMark.hmm CDS 137 283 . + 1 gene_id "1_g"; transcript_id "1_t";
1_dna GeneMark.hmm exon 307 344 0 + . gene_id "1_g"; transcript_id "1_t";
1_dna GeneMark.hmm CDS 307 344 . + 1 gene_id "1_g"; transcript_id "1_t";
1_dna GeneMark.hmm exon 371 543 0 + . gene_id "1_g"; transcript_id "1_t";
When Ideally I'd want something like (note I don't know if 1_dna maps to scaffold 1 - this is just an example):
scaffold1 GeneMark.hmm exon 137 283 0 + . gene_id "1_g"; transcript_id "1_t";
scaffold1 GeneMark.hmm CDS 137 283 . + 1 gene_id "1_g"; transcript_id "1_t";
scaffold1 GeneMark.hmm exon 307 344 0 + . gene_id "1_g"; transcript_id "1_t";
scaffold1 GeneMark.hmm CDS 307 344 . + 1 gene_id "1_g"; transcript_id "1_t";
scaffold1 GeneMark.hmm exon 371 543 0 + . gene_id "1_g"; transcript_id "1_t";
My command was simply:
gmes_petap.pl --ES --cores 6 --sequence assembly.fasta
Any tips/insights into what might be happening would be appreciated.
Cheers