Question: Clusters of max SNP distance
0
gravatar for samuel.lipworth
2.1 years ago by
University of Oxford
samuel.lipworth30 wrote:

Hi,

I want to find all clusters of a max SNP distance of say 12 snps of 500 samples. I have a data matrix showing the SNP distances but need an algorithm to cluster them - something like hierarchical clustering with a termination at maximum distance of 12 but I'm not sure how to do this in eg R. Any ideas?

Thanks

snp R • 781 views
ADD COMMENTlink modified 2.1 years ago by Petr Ponomarenko2.6k • written 2.1 years ago by samuel.lipworth30

Could you please give an example of data and output you want to get. Also if you can explain the reason for the question we might be able to find the solution faster.

ADD REPLYlink written 2.1 years ago by Petr Ponomarenko2.6k

Sure: a matrix of snp distances between 4 samples eg.

0    
500 0      
34   4       0     
19   20      3     0

So i can obviously reconstruct the phylogeny using eg ML which would show me that there is a reconstructed snp distance of <12 between samples 3 and 2, and also 4 and 3. Essentially I want to define all clusters where the maximum distance between any cluster member and its nearest neighbour is 12. I could do this by simply looking at the ML tree but this becomes tedious with massive data sets. The reason for doing this is to look for evidence of transmission.

ADD REPLYlink written 2.1 years ago by samuel.lipworth30

If I get it right, you want to cluster in binary space where distance <12 is considered equally "close" lets use 0 to show it and >=12 is "far" and we can assign 1 to such cases. Then you transform your matrix to

0
1 0
1 0 0
1 1 0 0

and you want to cluster it then? If so, you can use dist(x, method="binary") in R for distance measure (which is Jaccard), and then use the distance matrix object in a clustering algorithm like hclust. Otherwise you can start with binary clustering with coclusterBinary from https://cran.r-project.org/web/packages/blockcluster/vignettes/blockcluster_tutorial.pdf

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Petr Ponomarenko2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2512 users visited in the last hour