Question: Training GeneMark-ES: Data format error; gmhmme3 : warning, evidence_training.gff line ignored
0
gravatar for gabri
10 months ago by
gabri50
gabri50 wrote:

Hi All,

I'm trying to train GeneMark-ES for use it in a following analysis with Maker2. I'm using a sample of sequences and a gff of CDS obtained mapping some RNAseq data on these sequences with Trinity. I used Geneious to obtain the gff.

This is a portion of my gff:

gff-version 3

source-version geneious 10.2.3

FCD_0297 Geneious CDS 3874157 3874871 . - . Name=TRINITY_DN13090_c0_g1_i1

FCD_0297 Geneious CDS 3873567 3873636 . - . Name=TRINITY_DN13090_c0_g1_i1

FCD_0297 Geneious CDS 3873404 3873489 . - . Name=TRINITY_DN13090_c0_g1_i1

FCD_0297 Geneious CDS 440175 440212 . + . Name=TRINITY_DN16051_c0_g1_i1

FCD_0297 Geneious CDS 439015 439129 . + . Name=TRINITY_DN16051_c0_g1_i1

FCD_0297 Geneious CDS 438757 438864 . + . Name=TRINITY_DN16051_c0_g1_i1

FCD_0144 Geneious CDS 769114 769734 . - . Name=TRINITY_DN14911_c3_g2_i1

FCD_0144 Geneious CDS 768367 769024 . - . Name=TRINITY_DN14911_c3_g2_i1

FCD_0144 Geneious CDS 766199 766381 . - . Name=TRINITY_DN14911_c3_g2_i1

This is the command line I used for GeneMark-ES:

$ perl gmes_petap.pl --ES --training --evidence evidence.gff --sequence sequences.fasta

The process gave me hundreds of errors like these:

Data format error: dna.fa_1 Geneious CDS 946644 946847 . + . Name=TRINITY_DN17542_c0_g1_i1

gmhmme3 : warning, file /data/evidence_training.gff line ignored : dna.fa_1 Geneious CDS 946644 946847 . + . Name=TRINITY_DN17542_c0_g1_i1

Data format error: dna.fa_1 Geneious CDS 3185913 3185948 . - . Name=TRINITY_DN3711_c1_g1_i1

gmhmme3 : warning, file data/evidence_training.gff line ignored : dna.fa_1 Geneious CDS 3185913 3185948 . - . Name=TRINITY_DN3711_c1_g1_i1

Data format error: dna.fa_1 Geneious CDS 3185633 3185827 . - . Name=TRINITY_DN3711_c1_g1_i1

gmhmme3 : warning, file /data/evidence_training.gff line ignored : dna.fa_1 Geneious CDS 3185633 3185827 . - . Name=TRINITY_DN3711_c1_g1_i1

Anyway, the process finished and produced some output.

"data" folder:

dna.fna, evidence_training.gff, training.fna

"training" folder:

dna.fa_1, dna.fa_2

"run" folder:

ini.mod, ES_A.mod, ES_B.mod, ES_C.mod

My questions are:

1) Is it correct to use an evidence file in gff format or the program requires another format? Is there something wrong in my gff file that generate these errors?

2) Did the training finish without using my gff?

3) Which ".mod" file is supposed to be the input for Maker2?

Thank you in advance for any advice!

ADD COMMENTlink written 10 months ago by gabri50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1208 users visited in the last hour