Question

Building phylogeny in MEGA 6

1

Entering edit mode

9.2 years ago

nick.w.jeffery3 ▴ 10

I am trying to build a ML phylogeny in MEGA 6 using COI gene sequences that vary in length from around 550-650 bp. The sequences are aligned. Is it best to use the "Use all sites", "Partial Deletion" or "Complete Deletion" option when estimating which nucleotide model to use and for building the actual phylogeny? Thanks in advance.

phylogeny COI MEGA • 4.1k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.2 years ago by nick.w.jeffery3 ▴ 10

0

Entering edit mode

Dear Brice: thank you for this insight about "using all sites" and the maximum phylogenetic information. Would you be able to give me references with which it will possible to defend such an approach? Many thanks, Leonhard

ADD REPLY • link 7.8 years ago by leoschnittger • 0

Ram · Answer 1 · 2015-03-02

1

Entering edit mode

9.2 years ago

Brice Sarver ★ 3.8k

I prefer to use all information available. Missing (or ambiguous) data does not contribute to the single site likelihood in most implementations. That said, complete deletion removes any sites with ambiguities before running the analysis. Partial deletion just removes any sites above a threshold. This could really truncate your dataset depending on how sparse it is.

Regardless of what you select, you need to use the same approach for estimating the model and estimating the phylogeny. This is important; the best-fit model of nucleotide sequence evolution might change once you remove sites.

You can usually perform more rigorous phylogenetic inference outside of MEGA using Garli, MrBayes, BEAST, etc. Might be something to consider if you're so inclined.

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.2 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

Thanks for the input, I also prefer to use all information available. So would you recommend using all sites (if it were you?). I'll be using MrBayes afterwards too which I'm more familiar with.

ADD REPLY • link 9.2 years ago by nick.w.jeffery3 ▴ 10

1

Entering edit mode

I would, because even if there is a high percentage of missing data at a site there still may be some information - why exclude it? There is precedence for this in the literature, too, with huge gene-by-taxa supermatrices with > 90% missing data still resolving deep splits because the information content is there.

ADD REPLY • link 9.2 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

Dear Brice: thank you for this insight about "using all sites" and the maximum phylogenetic information. Would you be able to give me references with which it will possible to defend such an approach? Many thanks, Leonhard

ADD REPLY • link 7.8 years ago by leoschnittger • 0