Question

Repeat regions were contained in cds file (extracted from evm.out.gff3 by using Perl script)

0

Entering edit mode

8.4 years ago

Ginsea Chen ▴ 130

Dear all

I predicted genes from a genome fragment by using EVM based on results of ab initio prediction, homologous sequences alignments and RNA-seq (trinity with or without genome-reference) database. Then I used an in-house Perl script of EVM to extract cds sequences from this fragment based on evm.out.gff3, while I found some repeat regions (which have been masked as NNNN) in some cds sequences. My question is how to treat these sequences ? delete whole sequence or this region ?

It is my first time in gene prediction, so I asked for help here. If anyone can give some suggestions, please help me.

Thanks all !

EVM CDS genome Repeats • 2.4k views

ADD COMMENT • link updated 8.3 years ago by abascalfederico ★ 1.2k • written 8.4 years ago by Ginsea Chen ▴ 130

Ram · Answer 1 · 2016-01-20

0

Entering edit mode

8.3 years ago

abascalfederico ★ 1.2k

Some CDS overlap repeats (e.g. Alu sequences). Do not delete anything. I guess you are getting NNNN because you are retrieving the sequence from a masked genome file. I would suggest to use an unmasked version of the genome.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by abascalfederico ★ 1.2k