Question: Confusion regarding generating phylogenetic tree from snp data
0
gravatar for ehzed
8 months ago by
ehzed10
ehzed10 wrote:

Hi,

I am currently comparing multiple breeds of one domestic animal and as part of my analysis, I plan on generating a phylogenetic tree. I started with whole genome sequencing data and since then I have called the snps.

Methods I have tried so far (you can skip ahead if you want!):

  • Tried to generate trees using SNPhylo (uninformative trees, every breed was equally distant from each other)

  • Concatenating multiple genes together to build a tree as suggested here. This made a slightly more informative tree, but because my datasets were from different sources (some from Illumina and some from SOLiD sequencing), the uneven read coverage meant that some breeds had far more snps in certain areas.

My problem:

Now I am turning to generating trees from snps. I have extracted regions (bed file) that have good coverage depth from all breeds and used that to subset my vcf files. But I am confused as to what to do next, I have read about:

  1. Generating a snp matrix

  2. Concatenating snps into a "fake" sequence

  3. Introduce snps in the vcf file into the reference genome (same as what I did when I built a tree from concatenated gene sequences)

  4. Use the SNPRelate R package (I can't find info regarding what method this package uses to make a tree..is it maximum likelihood, neighbour-joining...?).

I am leaning towards the 2nd or 3rd method. I don't know how to do 2 (this script doesn't seem to take into account the position of the snp in the genome, so it wouldn't make sense to do this to vcf files that contain many different sites, where some sites are common to all and others are unique). I already know how to do 3, but is that recommended for random, very small sections of the genome where you don't if that section is very conserved/divergent, or covers a gene? Also, what is the most recommended/established method? Thanks!

phylogenetic tree snps • 364 views
ADD COMMENTlink modified 8 months ago by Petr Ponomarenko2.4k • written 8 months ago by ehzed10
0
gravatar for Petr Ponomarenko
8 months ago by
United States / Los Angeles / ALAPY.com
Petr Ponomarenko2.4k wrote:

There is another option for you if you want to compare populations (breeds of many sample data) with other populations, you can try using ADMIXTURE https://www.genetics.ucla.edu/software/admixture/ to move to a K-dimensional vectors where K is relatively small. This will help you catch main aggregated information about each population for comparison.

ADD COMMENTlink written 8 months ago by Petr Ponomarenko2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 605 users visited in the last hour