Question

Why don't phylogenetic analyses using DNA sequences and protein sequences result in identical trees?

2

Entering edit mode

7.4 years ago

mafrah18 ▴ 20

I run phylogenetic analysis for the DNA sequences and phylogenetic analysis for the protein sequences using the rooted UPGMA method for the same Accession number “AB032107” ,but the trees not identical, why ?why the protein sequence and DNA sequence don't give the same tree? enter image description here

sequencing sequence phylogenetic • 3.5k views

ADD COMMENT • link updated 6.6 years ago by Biostar 20 • written 7.4 years ago by mafrah18 ▴ 20

0

Entering edit mode

Hello mafrah18!

It appears that your post has been cross-posted to another site: http://biology.stackexchange.com/questions/54163/why-dont-phylogenetic-analyses-using-dna-sequences-and-protein-sequences-result

If this post is not yours then let us know, but it appears to be very similar.

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 7.4 years ago by GenoMax 141k

score 2 · Answer 1 · 2016-12-12

2

Entering edit mode

7.4 years ago

Thomas ▴ 160

Because a DNA sequence and the equivalent polypeptide sequence does not contain the same information.

In the transition from DNA -> polypeptide, information is lost.

A codon will always map to the same single amino acid, but an amino acid does not map back just to one single codon. Hence, there is some ambiguity.

As phylogenetic trees, are constructed in part on the basis of the similarity between biological sequences, and polynucleotide and polypeptide sequences contains differing information - it is entirely possible to construct phylogenetic trees from the DNA and corresponding polypeptide sequence and get differing results.

ADD COMMENT • link 7.4 years ago by Thomas ▴ 160

2

Entering edit mode

This is true in general, but most DNA alignment and phylogenetic tree building methods that are used, at least the simple ones, are not codon models. The information content as it were, and complexity, of phylogenetic models would go as Codon > Protein > Nucleotide. Since the OP is doing UPGMA is it a simple nucleotide model, meaning there are actually fewer transition states than in the amino acid models. Codon models are a lot more complex, and aren't used nearly as often as they could be. It used to be computationally demanding and not doable for decent sized alignments

ADD REPLY • link 7.4 years ago by DG 7.3k

score 2 · Answer 2 · 2016-12-12

@Thomas was on the right track with his comment, but as I mentioned in my comment on that post, in your case he has it backwards. Nucleotide models (which are different from a codon model) are less complex in terms of modelling evolutionary changes compared to a protein model. You're dealing with a 4x4 transition matrix instead of a 20x20 matrix. And similarly estimates for 4 nucleotide frequencies versus 20 amino acid frequencies. If your sequences are highly similar to one another, sometimes a nucleotide alignment and tree is more appropriate as synonymous changes will still be informative. However, as the diversity of your sequences increases protein alignments will become more informative because they have much more information content. You could also go to a full codon model, although I don't know if any are implemented in the software you may be using. Codon alignments can also get a little tricky to do if you aren't familiar with the methods.

score 1 · Answer 3 · 2016-12-10

To build any tree you need multiple sequence alignment.

In my opinion this post describes many detailes you need.

Multiple Alignment: Protein Or Nucleotide Sequence?

See also this discussion below - it might be helpful.

https://www.researchgate.net/post/Which_is_more_informative_a_phylogenetic_tree_based_on_alignment_of_protein_amino_acid_sequences_or_one_based_on_the_corresponding_DNA_sequences11