Diploid sequences in ModelTest-NG: consensus or ambiguous sites?
1
0
Entering edit mode
6.2 years ago
CaffeSospeso ▴ 50

Hello,

I'm using nulcear genomic data from diploid organism. I want to build a phylogenetic tree by concatenating all sequences. One of the first steps is to estimate the model of molecular evolution for the concatenated sequences, for instance by using ModelTest-NG. However, I do not know how to deal wth heterozygotic sites.

Shall mark them as ambiguous sites or shall I use simply the consensus sequence?

Thank you

diploid modeltest next-gen • 1.8k views
ADD COMMENT
1
Entering edit mode
6.2 years ago
Vitis ★ 2.6k

I think it depends on the level of taxonomy/phylogeny you're trying to resolve. Above species level, usually heterozygosity is ignored, assuming genomic divergence at this level has provided enough fixed differences between populations that you would have enough power to resolve the relationship ignoring intra-specific variations. For intra-specific relationships, you'll need to use phased genotypes and build haplotypes before using the data for phylogeny reconstruction to resolve the haploid lineages.

ADD COMMENT
0
Entering edit mode

Yes, I agree with your point. However, I wanted to relax these assumptions, given that ignoring invariant site or polymorphic sites can build some bias on phylogenetic analyses. It is also true that there is not a general rule, at least for what I'm aware.

Concerning the "phased genotypes", I'm not sure about that. Reading the manual of RaxML-NG, it seems that genotype unphased can be used.

ADD REPLY
0
Entering edit mode

As I know, ignoring heterozygosity would affect branch length estimation of phylogenies at species/subspecies level, but generally would not affect the topology of the tree. See this:

https://academic.oup.com/mbe/article/31/4/817/1100394

In terms of heterozygosity and phasing, based on my learning, there was not much theoretical treatments on using heterozygous sites for phylogenetic signal (my theoretical background mostly comes from Inferring Phylogenies by Joseph Felsenstein). See the discussions here:

https://groups.google.com/forum/#!topic/raxml/D73d--TKE2E

Again, it seems the heterozygosity involved here didn't have an effect on the tree topology, but did have an effect on branch lengths.

ADD REPLY
0
Entering edit mode

I totally agree with you and I have already seen these publications, but I was hoping some progress since 2014 ;) However, if you see in the wiki documentation of RaxML-NG they do include as state order also polymorphic nucleotides I would say : "GENOTYPE (diploid unphased)".

You can find it at the bottom of this webpage on github: https://github.com/amkozlov/raxml-ng/wiki/Input-data#analysis-type

ADD REPLY
0
Entering edit mode

I'm not saying using heterozygous sites are strictly forbidden in phylogeny reconstruction, or would cause problems for RaxML. From my very personal opinion, I've been hesitant to use something not explicitly dealt with theory. Also, I suggested phasing because I think it may provide more information in terms of sorting the lineages, without any theoretical foundation, either. :)

ADD REPLY
0
Entering edit mode

Yes, I understood that you are not against heterozygous sites ;P , I was just wandering whether current tools allows to take them into account. However, I do not have a phased genome for the species I'm working on, I'm using teh closest relative species available, which is still very far in the evolutionary history. By the way,Vitis, thank you for this conversation and exchange of opinions.

ADD REPLY

Login before adding your answer.

Traffic: 1397 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6