I predicted genes from a genome fragment by using EVM based on results of ab initio prediction, homologous sequences alignments and RNA-seq (trinity with or without genome-reference) database. Then I used an in-house Perl script of EVM to extract cds sequences from this fragment based on evm.out.gff3, while I found some repeat regions (which have been masked as NNNN) in some cds sequences. My question is how to treat these sequences ? delete whole sequence or this region ?
It is my first time in gene prediction, so I asked for help here. If anyone can give some suggestions, please help me.
Thanks all !