Question: Building phylogeny in MEGA 6
1
gravatar for nick.w.jeffery3
4.1 years ago by
Canada
nick.w.jeffery310 wrote:

I am trying to build a ML phylogeny in MEGA 6 using COI gene sequences that vary in length from around 550-650 bp. The sequences are aligned. Is it best to use the "Use all sites", "Partial Deletion" or "Complete Deletion" option when estimating which nucleotide model to use and for building the actual phylogeny? Thanks in advance.
 

coi mega phylogeny • 2.5k views
ADD COMMENTlink modified 4.1 years ago by Brice Sarver2.6k • written 4.1 years ago by nick.w.jeffery310

Dear Brice: thank you for this insight about "using all sites" and the maximum phylogenetic information. Would you be able to give me references with which it will possible to defend such an approach? Many thanks, Leonhard

ADD REPLYlink written 2.8 years ago by leoschnittger0
1
gravatar for Brice Sarver
4.1 years ago by
Brice Sarver2.6k
United States
Brice Sarver2.6k wrote:

I prefer to use all information available. Missing (or ambiguous) data does not contribute to the single site likelihood in most implementations. That said, complete deletion removes any sites with ambiguities before running the analysis. Partial deletion just removes any sites above a threshold.  This could really truncate your dataset depending on how sparse it is.

Regardless of what you select, you need to use the same approach for estimating the model and estimating the phylogeny. This is important; the best-fit model of nucleotide sequence evolution might change once you remove sites.

You can usually perform more rigorous phylogenetic inference outside of MEGA using Garli, MrBayes, BEAST, etc. Might be something to consider if you're so inclined.

ADD COMMENTlink written 4.1 years ago by Brice Sarver2.6k

Thanks for the input, I also prefer to use all information available. So would you recommend using all sites (if it were you?). I'll be using MrBayes afterwards too which I'm more familiar with.

ADD REPLYlink written 4.1 years ago by nick.w.jeffery310
1

I would, because even if there is a high percentage of missing data at a site there still may be some information - why exclude it? There is precedence for this in the literature, too, with huge gene-by-taxa supermatrices with > 90% missing data still resolving deep splits because the information content is there.

ADD REPLYlink written 4.1 years ago by Brice Sarver2.6k

Dear Brice: thank you for this insight about "using all sites" and the maximum phylogenetic information. Would you be able to give me references with which it will possible to defend such an approach? Many thanks, Leonhard

ADD REPLYlink written 2.8 years ago by leoschnittger0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1062 users visited in the last hour