I am using phased DNA sequence data to generate a haplotype list and then find their distribution among populations. The pipeline so far is to generate the haplotype list and haplotype distribution in Arlequin, and then transform this data and use it to make a map and pie charts in R. However the list and distribution of haplotypes that Arlequin is generating is incomplete. This is what I usually get:
Haplotype id Haplotype definition ------------ -------------------- h1 TGGATTTG h2 TGGTTTTC h3 h5 h4 CGGTTTTG
What are the empty spaces for h3 and h5? Also some haplotypes (not rare and important) are not included in this list (are they the empty spaces?). Finally the matrix with the haplotype distribution do not correspond with the total haplotypes per population. For example, in population I have 4 individuals which means I have 8 haplotypes that can be of the same of different type. In the matrix this population results with 10 haplotypes. Where do these extra haplotypes come from? Is there other tool for this type of analysis? Can I generate a haplotype list and distribution in the same software? ThankS!