Question: Diploid sequences in ModelTest-NG: consensus or ambiguous sites?
gravatar for CaffeSospeso
2.1 years ago by
CaffeSospeso50 wrote:


I'm using nulcear genomic data from diploid organism. I want to build a phylogenetic tree by concatenating all sequences. One of the first steps is to estimate the model of molecular evolution for the concatenated sequences, for instance by using ModelTest-NG. However, I do not know how to deal wth heterozygotic sites.

Shall mark them as ambiguous sites or shall I use simply the consensus sequence?

Thank you

modeltest diploid next-gen • 592 views
ADD COMMENTlink modified 2.1 years ago by Vitis2.4k • written 2.1 years ago by CaffeSospeso50
gravatar for Vitis
2.1 years ago by
New York
Vitis2.4k wrote:

I think it depends on the level of taxonomy/phylogeny you're trying to resolve. Above species level, usually heterozygosity is ignored, assuming genomic divergence at this level has provided enough fixed differences between populations that you would have enough power to resolve the relationship ignoring intra-specific variations. For intra-specific relationships, you'll need to use phased genotypes and build haplotypes before using the data for phylogeny reconstruction to resolve the haploid lineages.

ADD COMMENTlink written 2.1 years ago by Vitis2.4k

Yes, I agree with your point. However, I wanted to relax these assumptions, given that ignoring invariant site or polymorphic sites can build some bias on phylogenetic analyses. It is also true that there is not a general rule, at least for what I'm aware.

Concerning the "phased genotypes", I'm not sure about that. Reading the manual of RaxML-NG, it seems that genotype unphased can be used.

ADD REPLYlink written 2.1 years ago by CaffeSospeso50

As I know, ignoring heterozygosity would affect branch length estimation of phylogenies at species/subspecies level, but generally would not affect the topology of the tree. See this:

In terms of heterozygosity and phasing, based on my learning, there was not much theoretical treatments on using heterozygous sites for phylogenetic signal (my theoretical background mostly comes from Inferring Phylogenies by Joseph Felsenstein). See the discussions here:!topic/raxml/D73d--TKE2E

Again, it seems the heterozygosity involved here didn't have an effect on the tree topology, but did have an effect on branch lengths.

ADD REPLYlink written 2.1 years ago by Vitis2.4k

I totally agree with you and I have already seen these publications, but I was hoping some progress since 2014 ;) However, if you see in the wiki documentation of RaxML-NG they do include as state order also polymorphic nucleotides I would say : "GENOTYPE (diploid unphased)".

You can find it at the bottom of this webpage on github:

ADD REPLYlink written 2.1 years ago by CaffeSospeso50

I'm not saying using heterozygous sites are strictly forbidden in phylogeny reconstruction, or would cause problems for RaxML. From my very personal opinion, I've been hesitant to use something not explicitly dealt with theory. Also, I suggested phasing because I think it may provide more information in terms of sorting the lineages, without any theoretical foundation, either. :)

ADD REPLYlink written 2.1 years ago by Vitis2.4k

Yes, I understood that you are not against heterozygous sites ;P , I was just wandering whether current tools allows to take them into account. However, I do not have a phased genome for the species I'm working on, I'm using teh closest relative species available, which is still very far in the evolutionary history. By the way,Vitis, thank you for this conversation and exchange of opinions.

ADD REPLYlink written 2.1 years ago by CaffeSospeso50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2275 users visited in the last hour