Question: Multiple Cdss In Genbank
2
gravatar for vilsnk
6.7 years ago by
vilsnk30
vilsnk30 wrote:

I understand that a similar question has already been asked How To Know Which Is The Start Codon And Stop Codon For A Gene Sequence? but I am completely new to bioinformatics and would really appreciate it if someone could spell it out for me. The GenBank entry for the gene I am interested in is shown here, however it appears to list 3 different CDSs for this single gene. I am trying to construct a phylogeny of a paralog family, of which this gene is a member. Is there an accepted CDS of the 3 available that I should use, when performing a codon alignment or is there something else I am missing here? Is the CDS even what I should be looking at to construct the alignment for the phylogeny?

phylogeny genbank • 1.6k views
ADD COMMENTlink modified 6.7 years ago by Emily_Ensembl21k • written 6.7 years ago by vilsnk30
3
gravatar for Emily_Ensembl
6.7 years ago by
Emily_Ensembl21k
EMBL-EBI
Emily_Ensembl21k wrote:

We face the same problem when doing our comparative genomics analysis in Ensembl. We try to pick a representative or canonical transcript using the following rules:

For human, the canonical transcript for a gene is set according to the following hierarchy: 1. Longest CCDS translation with no stop codons. 2. If no (1), choose the longest Ensembl/Havana merged translation with no stop codons. 3. If no (2), choose the longest translation with no stop codons. 4. If no translation, choose the longest non-protein-coding transcript.

So essentially, we use the longest one with the most evidence behind it. We don't in any way claim that this is the "best" transcript in terms of biological significance, but it is the most useful for comparative genomics as longer means more to work with, and more evidence means it's more likely to be real.

Some similar selection criteria should work for you.

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Emily_Ensembl21k

Thanks so much for your speedy reply. My supervisor led me to believe that there would be an accepted, preferred CDS for each gene on which to base analyses (experimentally validated or not), but if that is not the case then I'll just have to work it as best I can. Again, thank you.

ADD REPLYlink written 6.7 years ago by vilsnk30
1

Goal of education is that you know more than your supervisor; you're on the way!

ADD REPLYlink written 6.7 years ago by Neilfws48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1526 users visited in the last hour