Question: What is the best way to validate coding sequences (CDS) data from public databases ?
gravatar for chevivien
4.1 years ago by
chevivien70 wrote:

Hii ,

Iam currently working on   mammalian  CDS  data downloaded from Ensembl83. Since I wanted to be sure I am working   with "real CDS" sequences , I have  done gene prediction  using CDS data as input  and   AUGUSTUS software with human data acting as the training set. Surprisingly for few species which I have   done  the prediction  I am getting less coding sequences than what ensemble 8 .For instance for in genome I downloaded roughly 20,000 coding sequences from Ensemble ,on running AUGUSTUS I got 18,000 genes.  

Iam wondering if there  is  a best way to validate the CDS data from public databases apart from using gene prediction tools?  A difference of more than 4000 genes  does not make sense to me  as such.


ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by chevivien70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1922 users visited in the last hour