Repeat regions were contained in cds file (extracted from evm.out.gff3 by using Perl script)
5.9 years ago
Ginsea Chen ▴ 130

Dear all

I predicted genes from a genome fragment by using EVM based on results of ab initio prediction, homologous sequences alignments and RNA-seq (trinity with or without genome-reference) database. Then I used an in-house Perl script of EVM to extract cds sequences from this fragment based on evm.out.gff3, while I found some repeat regions (which have been masked as NNNN) in some cds sequences. My question is how to treat these sequences ? delete whole sequence or this region ?

It is my first time in gene prediction, so I asked for help here. If anyone can give some suggestions, please help me.

Thanks all !

EVM CDS genome Repeats • 1.7k views
5.7 years ago
abascalfederico ★ 1.2k

Some CDS overlap repeats (e.g. Alu sequences). Do not delete anything. I guess you are getting NNNN because you are retrieving the sequence from a masked genome file. I would suggest to use an unmasked version of the genome.


