I'm a newbie starting to use gene prediction software & am trying to understand how GeneMarkS works. I was able to process my dataset using GeneMarkS software and was going through intermediate files. (mod file, lst, faa files) Can I get some help on understanding formats & how it works?
- What does COD1/COD2 in .mod file means ? COD seems like it is for coding region and NONC for non-coding region & numbers in each section represents transitional probabilities for HMM. Is that correct ?
- why are there 64 rows for each COD1/COD2/NONC sections? Is this for each codon (4x4x4) per row? If so, does anyone know how they distinguish/sort codons in that file?
- Can anyone help me understanding what does native/heuristic model parameter means & how GeneMarkS combines them?
- I am seeing predicted ORF sequence does not always start with "Methionine". (AUG) Would there be any reason for this?
I was trying to answer these questions by going through the papers, but hasn't got any luck so far. Any help/guidance in answering them will be greatly appreciated!