What is the best way to validate coding sequences (CDS) data from public databases ?
0
0
Entering edit mode
8.2 years ago
chevivien ▴ 90

Hi,

I am currently working on mammalian CDS data downloaded from Ensembl83. Since I wanted to be sure I am working with "real CDS" sequences, I have done gene prediction using CDS data as input and AUGUSTUS software with human data acting as the training set. Surprisingly for few species which I have done the prediction I am getting less coding sequences than what ensemble 8 .For instance for in genome I downloaded roughly 20,000 coding sequences from Ensemble ,on running AUGUSTUS I got 18,000 genes.

I am wondering if there is a best way to validate the CDS data from public databases apart from using gene prediction tools? A difference of more than 4000 genes does not make sense to me as such.

Augustus CDS-prediction Ensembl • 1.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6