I want to carry out gene prediction for fungus Cochliobolus sativus isolated strain. As there is no fungal training model available in GlimmerHMM, I am creating one using C.sativus ND90Pr, C.victoriae, C.miyabeanus ATCC 44560 v1.0, C.lunatus, C.heterostrophus, C.carbonum genomic data from JGI. When I execute trainGlimmerHMM <multifasta_file> <exon_file> I get an error for specific lines in my dummy exon file. According to my observation, the error occurs for reverse strand lines only. As mentioned in its README file, I have separated them with a blank line and also mentioned the co-ordinates in descending order. I get an error ERROR 27: Wrong exon coordinates file. Exon file line: scaffold_0 exon 3002 2420

Below is the dummy exon file

scaffold_0 3002 2420
scaffold_0 2422 2420
scaffold_0 3933 3078
scaffold_0 4219 3995
scaffold_0 4304 4267
scaffold_0 4397 4357
scaffold_0 4699 4450
scaffold_0 5213 5115
scaffold_0 5575 5264
scaffold_0 5724 5633
scaffold_0 5812 5778
scaffold_0 5921 5864
scaffold_0 5921 5919

scaffold_0 6144 6190
scaffold_0 6144 6146
scaffold_0 6247 6394
scaffold_0 6452 6598
scaffold_0 6596 6598

scaffold_0 7222 7310
scaffold_0 7222 7224
scaffold_0 7365 7461
scaffold_0 7526 7927
scaffold_0 7925 7927

scaffold_0 8253 9230
scaffold_0 8253 8255
scaffold_0 9228 9230

If I run the 'train' command only for forward strand exon co-ordinates, training set is created successfully. Can anyone please point out where I am going wrong?

Can you check the length of scaffold_0 in multifasta_file file?

The length of scaffold_0 is 870365 bases.

Error is generated most probably from this file:

Search for ERROR 27: Wrong exon coordinates file. Exon file line I am not very good at perl so can't say much but my ($anum,$ex1,$ex2)=/^(\S+)\s*([\>|\<]*\d+)\s*([\>|\<]*\d+\s*)$/;

In this line either anum or ex1 or ex2 has not been set properly.

Hope it helps somehow.

