I am trying to create a phylogeny and haplotype network for a specific gene in the 1000 genomes project. I am wanting to answer the following questions:
- What major haplotypes are seen in this specific region of interest?
- How do each haplotype relate to one another?
- Which populations are most similar at this region?
I believe I need to do the following steps but am unsure if this is the right track:
- Splice 1kgp VCF down to region of interest/populations of interest
- Remove all private SNPs/SNPs with poor coverage across samples with VCFtools
- Convert VCF to PHYLIP using PGDSpider
- Create phylogenetic tree with RaxML
- Visualize phylogenetic tree with FigTree; and finally,
- Make a haplotype network, which I am completely lost on how to do.
I know how to splice the VCFs from the 1000 genomes project down to the region of interest and use VCFTools/PGDSpider already
How do I choose an outgroup in RaxML and how do I get the sequence files that match those of the 1kgp from non-human primates? Additionally, I am interested in the individual alleles, not people, how to I show a "phylogeny" of each haplotype? I know the 1kgp is phased so it should be possible?
What is the best method to make a haplotype network from the 1kgp datasets?
Thanks so much!