Question: Repeat regions were contained in cds file (extracted from evm.out.gff3 by using Perl script)
gravatar for Ginsea Chen
4.9 years ago by
Ginsea Chen130
Chinese Academy of Tropical Agricultural Sciences, Danzhou, China
Ginsea Chen130 wrote:

Dear all

I predicted genes from a genome fragment by using EVM based on results of ab initio prediction, homologous sequences alignments and RNA-seq (trinity with or without genome-reference) database. Then I used an in-house Perl script of EVM to extract cds sequences from this fragment based on evm.out.gff3, while I found some repeat regions (which have been masked as NNNN) in some cds sequences. My question is how to treat these sequences ? delete whole sequence or this region ?

It is my first time in gene prediction, so I asked for help here. If anyone can give some suggestions, please help me.

Thanks all !

evm repeats cds genome • 1.5k views
ADD COMMENTlink modified 4.8 years ago by abascalfederico1.1k • written 4.9 years ago by Ginsea Chen130
gravatar for abascalfederico
4.8 years ago by
abascalfederico1.1k wrote:

Some CDS overlap repeats (e.g. Alu sequences). Do not delete anything. I guess you are getting NNNN because you are retrieving the sequence from a masked genome file. I would suggest to use an unmasked version of the genome.

ADD COMMENTlink modified 10 months ago by RamRS30k • written 4.8 years ago by abascalfederico1.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1642 users visited in the last hour