Question: Genemark-Es Gtf Output Conversion To Mod
1
gravatar for rob234king
5.1 years ago by
rob234king570
UK/Harpenden/Rothamsted Research
rob234king570 wrote:

I'm running Maker2 to annotate a genome but need to train Genemark-ES first. I have run Eukaryotic Genemark.hmm using the perl script which finished producing the GTF file but Maker2 requires a mod file and the mod folder is empty.

 /home/apps/scripts/gm_es.pl ../RR_1.7b.fasta

How do I convert the GTF output file to the mod format or have I missed something/errors? The documentation doesn't seem to explain this.

There are some errors given out during the process, the first part of output has no errors (see below)

    running hmm2nt.a2
4 files IN
Clusters were defined as:
 0 <= GC% <= 99
99 < GC% <= 99
99 < GC% <= 100

Parsing dna.fa.good.cod

Program complete
----------------
6442 sequences found
6443 dna.fa.good.ini
first order for Ini
GC Range: (0,99)
6442 sequences of length 12 used from 6442
total sequences in dna.fa.good.ini
Generating model...
TT     0.25 0.35 0.16 0.19 0.27 0.15 0.00 0.00 0.00 0.00 0.28 0.27
TC     0.25 0.35 0.55 0.23 0.36 0.58 0.00 0.00 0.00 0.00 0.53 0.32
TA     0.25 0.16 0.15 0.29 0.20 0.14 1.00 0.00 0.00 0.00 0.10 0.13
TG     0.25 0.14 0.14 0.30 0.17 0.13 0.00 0.00 1.00 0.00 0.08 0.28
CT     0.25 0.30 0.20 0.11 0.28 0.14 0.00 0.00 0.00 0.00 0.31 0.35
CC     0.25 0.30 0.35 0.10 0.26 0.40 0.00 0.00 0.00 0.00 0.30 0.25
CA     0.25 0.26 0.32 0.62 0.27 0.33 1.00 0.00 0.00 0.00 0.23 0.21
CG     0.25 0.14 0.13 0.17 0.19 0.13 0.00 0.00 0.00 0.00 0.15 0.20
AT     0.25 0.27 0.20 0.15 0.23 0.18 0.00 1.00 0.00 0.00 0.24 0.23
AC     0.25 0.27 0.35 0.16 0.32 0.30 0.00 0.00 0.00 0.00 0.27 0.27
AA     0.25 0.29 0.23 0.43 0.36 0.30 1.00 0.00 0.00 0.00 0.31 0.22
AG     0.25 0.18 0.21 0.25 0.10 0.23 0.00 0.00 0.00 0.00 0.19 0.29
GT     0.25 0.24 0.25 0.21 0.19 0.21 0.00 0.00 0.00 0.18 0.15 0.30
GC     0.25 0.33 0.34 0.23 0.35 0.34 0.00 0.00 0.00 0.20 0.41 0.35
GA     0.25 0.28 0.25 0.34 0.30 0.28 1.00 0.00 0.00 0.26 0.29 0.21
GG     0.25 0.16 0.16 0.22 0.16 0.18 0.00 0.00 0.00 0.36 0.15 0.15
Done
6443 lines read from dna.fa.good.ter
6442 sequences obtained
1 comment lines
0 lines contained no sequence (or improperly formatted seq)
2445 sequences used TAA
1802 sequences used TAG
2195 sequences used TGA
0 sequences did not begin with a stop codon
All lines accounted for
Done
2445 dna.fa.good.taa
first order for TAA
GC Range: (0,99)
2445 sequences of length 12 used from 2445
total sequences in dna.fa.good.taa
Generating model...
TT     1.00 0.00 0.00 0.00 0.32 0.30 0.28 0.30 0.32 0.29 0.30 0.28
TC     0.00 0.00 0.00 0.00 0.16 0.19 0.15 0.20 0.21 0.21 0.21 0.18
TA     0.00 1.00 0.00 0.00 0.26 0.26 0.25 0.25 0.21 0.24 0.26 0.26
TG     0.00 0.00 0.00 0.00 0.25 0.25 0.32 0.25 0.26 0.26 0.23 0.28
CT     1.00 0.00 0.00 0.00 0.26 0.26 0.27 0.28 0.33 0.30 0.26 0.27
CC     0.00 0.00 0.00 0.00 0.17 0.15 0.20 0.20 0.19 0.20 0.18 0.20
CA     0.00 0.00 0.00 0.00 0.37 0.34 0.35 0.38 0.30 0.33 0.37 0.39
CG     0.00 0.00 0.00 0.00 0.20 0.25 0.19 0.14 0.19 0.17 0.18 0.14
AT     1.00 0.00 0.00 0.20 0.26 0.31 0.26 0.28 0.29 0.31 0.29 0.33
AC     0.00 0.00 0.00 0.15 0.22 0.15 0.18 0.22 0.22 0.23 0.22 0.22
AA     0.00 0.00 1.00 0.29 0.26 0.28 0.30 0.29 0.27 0.24 0.30 0.23
AG     0.00 0.00 0.00 0.36 0.25 0.25 0.25 0.21 0.22 0.23 0.19 0.22
GT     1.00 0.00 0.00 0.00 0.46 0.22 0.22 0.25 0.25 0.23 0.25 0.26
GC     0.00 0.00 0.00 0.00 0.20 0.17 0.19 0.19 0.19 0.21 0.18 0.22
GA     0.00 0.00 0.00 0.00 0.20 0.36 0.34 0.31 0.33 0.34 0.31 0.32
GG     0.00 0.00 0.00 0.00 0.15 0.26 0.25 0.25 0.22 0.22 0.26 0.20
Done
1802 dna.fa.good.tag
zero order for TAG
GC Range: (0,99)
1802 sequences of length 12 used from 1802
total sequences in dna.fa.good.tag
Generating model...
T    1.00 0.00 0.00 0.17 0.33 0.28 0.25 0.28 0.31 0.29 0.29 0.32
C    0.00 0.00 0.00 0.13 0.20 0.15 0.17 0.18 0.18 0.18 0.19 0.17
A    0.00 1.00 0.00 0.48 0.26 0.30 0.30 0.30 0.26 0.29 0.29 0.28
G    0.00 0.00 1.00 0.22 0.21 0.27 0.27 0.24 0.25 0.25 0.24 0.23
Done
2195 dna.fa.good.tga
first order for TGA
GC Range: (0,99)
2195 sequences of length 12 used from 2195
total sequences in dna.fa.good.tga
Generating model...
TT     1.00 0.00 0.00 0.00 0.30 0.32 0.31 0.29 0.30 0.32 0.28 0.31
TC     0.00 0.00 0.00 0.00 0.15 0.14 0.13 0.18 0.14 0.22 0.19 0.16
TA     0.00 0.00 0.00 0.00 0.24 0.23 0.26 0.22 0.25 0.21 0.26 0.28
TG     0.00 1.00 0.00 0.00 0.31 0.31 0.30 0.31 0.30 0.25 0.28 0.25
CT     1.00 0.00 0.00 0.00 0.29 0.29 0.29 0.30 0.28 0.28 0.28 0.28
CC     0.00 0.00 0.00 0.00 0.22 0.14 0.19 0.20 0.20 0.18 0.15 0.15
CA     0.00 0.00 0.00 0.00 0.32 0.37 0.34 0.36 0.31 0.37 0.35 0.39
CG     0.00 0.00 0.00 0.00 0.17 0.20 0.18 0.14 0.21 0.18 0.21 0.18
AT     1.00 0.00 0.00 0.31 0.20 0.27 0.30 0.29 0.28 0.30 0.25 0.31
AC     0.00 0.00 0.00 0.12 0.22 0.21 0.18 0.22 0.24 0.21 0.19 0.21
AA     0.00 0.00 0.00 0.24 0.27 0.26 0.29 0.25 0.26 0.24 0.28 0.27
AG     0.00 0.00 0.00 0.33 0.31 0.27 0.23 0.24 0.23 0.25 0.28 0.22
GT     1.00 0.00 0.00 0.00 0.28 0.23 0.21 0.24 0.25 0.26 0.25 0.27
GC     0.00 0.00 0.00 0.00 0.19 0.18 0.15 0.20 0.18 0.18 0.19 0.19
GA     0.00 0.00 1.00 0.00 0.26 0.35 0.36 0.30 0.33 0.34 0.32 0.34

But then carrying on I get Error: unknown line format

3294 dna.fa.good.gb.acc.ph2
first order for ACC 2
Error: unknown line format
       GC%      Intron  Accession        (File generated at 2014/02/15 Sat 13:38:23 GMT)

GC Range: (0,99)
3293 sequences of length 21 used from 3294
total sequences in dna.fa.good.gb.acc.ph2
Generating model...
• 3.0k views
ADD COMMENTlink modified 2.0 years ago by S AR50 • written 5.1 years ago by rob234king570
1

Can anyone help me with this error: error, file not found info/training.fna

I ran the following command to generate gmhmm.mode file of my desired model

gmes_petap.pl --ES --cores 4 --sequence test_genome.fasta

ADD REPLYlink written 2.0 years ago by S AR50

Hi angelshiza,

Recently, I have encountered with the same error ("error, file not found info/training.fna"). Have you found a solution to troubleshoot this error ? Any help is appreciated. Thanks in advance.

ADD REPLYlink written 20 months ago by gauravdube0070

I also had the same error. There are two solutions. Change the perl path to all .pl files or use the command change_path_in_perl_scripts.pl

I did it in the following way: victorc:~/bin/gm_et_linux_64/gmes_petap $ ./change_path_in_perl_scripts.pl /home/victor/bin/perl

ADD REPLYlink written 15 days ago by victorcana19910

Hi, I was wondering if you found a solution to the above problem? I am now facing exactly the same problem and have no idea how to fix it.

 

Thaks

ADD REPLYlink written 4.3 years ago by tuanduonganh0
3
gravatar for Jon
3.2 years ago by
Jon320
United States - US FS
Jon320 wrote:

Hopefully you got this figured out by now, but the mod file required for maker is located in the genemark 'output' directory, by default it is called 'gmhmm.mod'.  So with the most recent version of GeneMark-ES 4.32, you would run something like the following command:

gmes_petap.pl --ES  --cores 4 --sequence test_genome.fasta

The file for maker would be in the output folder, so output/gmhmm.mod.

ADD COMMENTlink written 3.2 years ago by Jon320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1833 users visited in the last hour