I'm using nulcear genomic data from diploid organism. I want to build a phylogenetic tree by concatenating all sequences. One of the first steps is to estimate the model of molecular evolution for the concatenated sequences, for instance by using ModelTest-NG. However, I do not know how to deal wth heterozygotic sites.
Shall mark them as ambiguous sites or shall I use simply the consensus sequence?
Yes, I agree with your point. However, I wanted to relax these assumptions, given that ignoring invariant site or polymorphic sites can build some bias on phylogenetic analyses. It is also true that there is not a general rule, at least for what I'm aware.
Concerning the "phased genotypes", I'm not sure about that. Reading the manual of RaxML-NG, it seems that genotype unphased can be used.
As I know, ignoring heterozygosity would affect branch length estimation of phylogenies at species/subspecies level, but generally would not affect the topology of the tree. See this:
In terms of heterozygosity and phasing, based on my learning, there was not much theoretical treatments on using heterozygous sites for phylogenetic signal (my theoretical background mostly comes from Inferring Phylogenies by Joseph Felsenstein). See the discussions here:
Again, it seems the heterozygosity involved here didn't have an effect on the tree topology, but did have an effect on branch lengths.
I totally agree with you and I have already seen these publications, but I was hoping some progress since 2014 ;) However, if you see in the wiki documentation of RaxML-NG they do include as state order also polymorphic nucleotides I would say : "GENOTYPE (diploid unphased)".
You can find it at the bottom of this webpage on github: https://github.com/amkozlov/raxml-ng/wiki/Input-data#analysis-type
I'm not saying using heterozygous sites are strictly forbidden in phylogeny reconstruction, or would cause problems for RaxML. From my very personal opinion, I've been hesitant to use something not explicitly dealt with theory. Also, I suggested phasing because I think it may provide more information in terms of sorting the lineages, without any theoretical foundation, either. :)
Yes, I understood that you are not against heterozygous sites ;P , I was just wandering whether current tools allows to take them into account. However, I do not have a phased genome for the species I'm working on, I'm using teh closest relative species available, which is still very far in the evolutionary history. By the way,Vitis, thank you for this conversation and exchange of opinions.