Question

Multiple Cdss In Genbank

2

Entering edit mode

11.7 years ago

vilsnk ▴ 30

I understand that a similar question has already been asked How To Know Which Is The Start Codon And Stop Codon For A Gene Sequence? but I am completely new to bioinformatics and would really appreciate it if someone could spell it out for me. The GenBank entry for the gene I am interested in is shown here, however it appears to list 3 different CDSs for this single gene. I am trying to construct a phylogeny of a paralog family, of which this gene is a member. Is there an accepted CDS of the 3 available that I should use, when performing a codon alignment or is there something else I am missing here? Is the CDS even what I should be looking at to construct the alignment for the phylogeny?

genbank phylogeny • 2.4k views

ADD COMMENT • link updated 11.7 years ago by Emily 24k • written 11.7 years ago by vilsnk ▴ 30

score 3 · Answer 1 · 2013-11-12

3

Entering edit mode

11.7 years ago

Emily 24k

We face the same problem when doing our comparative genomics analysis in Ensembl. We try to pick a representative or canonical transcript using the following rules:

For human, the canonical transcript for a gene is set according to the following hierarchy: 1. Longest CCDS translation with no stop codons. 2. If no (1), choose the longest Ensembl/Havana merged translation with no stop codons. 3. If no (2), choose the longest translation with no stop codons. 4. If no translation, choose the longest non-protein-coding transcript.

So essentially, we use the longest one with the most evidence behind it. We don't in any way claim that this is the "best" transcript in terms of biological significance, but it is the most useful for comparative genomics as longer means more to work with, and more evidence means it's more likely to be real.

Some similar selection criteria should work for you.

ADD COMMENT • link 11.7 years ago by Emily 24k

0

Entering edit mode

Thanks so much for your speedy reply. My supervisor led me to believe that there would be an accepted, preferred CDS for each gene on which to base analyses (experimentally validated or not), but if that is not the case then I'll just have to work it as best I can. Again, thank you.

ADD REPLY • link 11.7 years ago by vilsnk ▴ 30

1

Entering edit mode

Goal of education is that you know more than your supervisor; you're on the way!

ADD REPLY • link 11.7 years ago by Neilfws 49k