Hi all, I need some suggestions to map the number of SNPs on my tree. So the goal is to start from a sequence alignment and build a phylogenetic tree and I want to see the number of SNPs on each branch of this tree. Some people use mesquite for that but no one say how! it will be also nice to have the list of SNP positions for each branch. So if you know a way to do it, even with programming, please share you experience. Kind regards, Amine
What you will do is you will exploit the theory of splits in a phylogenetic tree. The edge you want to describe is your split. That mean you not only split the tree but also the underlying MSA. So you get two chunks of sequences which are previously aligned. Now iterate over every position from the two chunks of sequences. And calculate in case of ANOVA the edit distance within and between you two chunks of sequences. Induce this into i.e. Fisher's Exact to get the positions which are responsible for the edge. Positions with an E-Value below threshold are the SNPs responsible for the edge. Using the quartet mapping you need to generate 4 different subtrees. From the split edge (the one you want to infer important SNPs) create left-top, left-bottom, right-top, right bottom subtree. Now try to incorporate the letter from the edge you are analyizing. To illustrate, you should create something like this:phy-tree-images The red edge is the edge you are investigating. Now if the distance from A/B to the letter of the red edge is shorter than the distance to C/D (or C/D is shorter than A/B) it supports the edge if not (which means they are equal) there is no support from this site for the edge. Hence you can as well induce a Fishers over that and you are good to go to create a subset. Care, this only handles binary SNPs.
I can't find the code atm... sry about that, don't hesitate to ask if it is still up to date...