I am currently comparing multiple breeds of one domestic animal and as part of my analysis, I plan on generating a phylogenetic tree. I started with whole genome sequencing data and since then I have called the snps.
Methods I have tried so far (you can skip ahead if you want!):
Tried to generate trees using SNPhylo (uninformative trees, every breed was equally distant from each other)
Concatenating multiple genes together to build a tree as suggested here. This made a slightly more informative tree, but because my datasets were from different sources (some from Illumina and some from SOLiD sequencing), the uneven read coverage meant that some breeds had far more snps in certain areas.
Now I am turning to generating trees from snps. I have extracted regions (bed file) that have good coverage depth from all breeds and used that to subset my vcf files. But I am confused as to what to do next, I have read about:
Generating a snp matrix
Concatenating snps into a "fake" sequence
Introduce snps in the vcf file into the reference genome (same as what I did when I built a tree from concatenated gene sequences)
Use the SNPRelate R package (I can't find info regarding what method this package uses to make a tree..is it maximum likelihood, neighbour-joining...?).
I am leaning towards the 2nd or 3rd method. I don't know how to do 2 (this script doesn't seem to take into account the position of the snp in the genome, so it wouldn't make sense to do this to vcf files that contain many different sites, where some sites are common to all and others are unique). I already know how to do 3, but is that recommended for random, very small sections of the genome where you don't if that section is very conserved/divergent, or covers a gene? Also, what is the most recommended/established method? Thanks!