Question: Confusion regarding generating phylogenetic tree from snp data
gravatar for ehzed
3.8 years ago by
ehzed30 wrote:


I am currently comparing multiple breeds of one domestic animal and as part of my analysis, I plan on generating a phylogenetic tree. I started with whole genome sequencing data and since then I have called the snps.

Methods I have tried so far (you can skip ahead if you want!):

  • Tried to generate trees using SNPhylo (uninformative trees, every breed was equally distant from each other)

  • Concatenating multiple genes together to build a tree as suggested here. This made a slightly more informative tree, but because my datasets were from different sources (some from Illumina and some from SOLiD sequencing), the uneven read coverage meant that some breeds had far more snps in certain areas.

My problem:

Now I am turning to generating trees from snps. I have extracted regions (bed file) that have good coverage depth from all breeds and used that to subset my vcf files. But I am confused as to what to do next, I have read about:

  1. Generating a snp matrix

  2. Concatenating snps into a "fake" sequence

  3. Introduce snps in the vcf file into the reference genome (same as what I did when I built a tree from concatenated gene sequences)

  4. Use the SNPRelate R package (I can't find info regarding what method this package uses to make a it maximum likelihood, neighbour-joining...?).

I am leaning towards the 2nd or 3rd method. I don't know how to do 2 (this script doesn't seem to take into account the position of the snp in the genome, so it wouldn't make sense to do this to vcf files that contain many different sites, where some sites are common to all and others are unique). I already know how to do 3, but is that recommended for random, very small sections of the genome where you don't if that section is very conserved/divergent, or covers a gene? Also, what is the most recommended/established method? Thanks!

phylogenetic tree snps • 2.0k views
ADD COMMENTlink modified 3.8 years ago by Petr Ponomarenko2.6k • written 3.8 years ago by ehzed30

Hi, I am trying to do a similar thing. Have you found a solution yet?

ADD REPLYlink written 2.4 years ago by JJ520
gravatar for Petr Ponomarenko
3.8 years ago by
United States / Los Angeles /
Petr Ponomarenko2.6k wrote:

There is another option for you if you want to compare populations (breeds of many sample data) with other populations, you can try using ADMIXTURE to move to a K-dimensional vectors where K is relatively small. This will help you catch main aggregated information about each population for comparison.

ADD COMMENTlink written 3.8 years ago by Petr Ponomarenko2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1124 users visited in the last hour