Question

Comparing MDS and Cluster analysis

0

Entering edit mode

6.9 years ago

beatrice.baldi2 • 0

Hello Everyone. This is my first post on here so I hope I don't break any rules! I have a set of 30snps sampled from 450 individuals. I have performed an MDS on my dataset and a cluster analysis. In both cases I have obtained two distinct groups. I would like to see if the individuals in the two groups are the same both for the MDS and the cluster analysis. Do you have any suggestions on how I could solve this?

Thanks for your help!

SNP R MDS • 2.2k views

ADD COMMENT • link updated 6.9 years ago by Jean-Karim Heriche 27k • written 6.9 years ago by beatrice.baldi2 • 0

0

Entering edit mode

What are the values in your data matrix? Are the variant allele frequency for each SNP in your samples? Again when you say SNPs I reckon they are rsID having VAF. First of all 30 SNPs is not very much but your number of individual's seem pretty large. When you say you see 2 distinct groups with 2 individual's do they have any biological inferencing based on certain dominant allele or for that matter disease states? You have to first understand that. Why you see 2 groups and if it's biologically relevant. Coming to your query MDS is different from clustering analysis. If you use cmdscale and just plot all dimensions thinking the data points spread over N samples it is fine but a PCA can also do that but then PCA is a bit different from MDS. Again if you want to do clustering then you can simply make a heat map of the correlation of the matrix of 30 SNPs in rows and 450 individual's in columns. To see what is the group separation. Keep in mind they have different mathematical deductions. Clustering can be on distances or even on correlation and method cab be either complete, average of ward.D2. so different methods will give different forms depending on your data and it's noise. I don't think it is always a good approach to consider similar grouping of MDS with clustering. But you can simply plot the heat map of correlation values of the matrix that you have and then fine tune the method for clustering and the distance methods for dendrogram to see if it resembles pattern of your MDS but the ordering will not always be same. You can obviously converge. Mind it MDS or PCA tries to see highest variable components from your entire data and tries to combine them in a linear scale to reduce the entire data to a smaller set. To understand what are it's componennts you have to use R and put the results in an object and just retrieve the order of the samples. But small observation should be in theory looked for clustering rather than eigen values or even MDS.

ADD REPLY • link 6.9 years ago by ivivek_ngs ★ 5.2k

score 0 · Answer 1 · 2017-06-07

There are several ways to compare two sets of clustering results. Here are a few I can think of:
- Count the pairs that end up in the same cluster in the two approaches (or pairs that end up in different clusters).
- Use the Jaccard index.
- Use the Rand index.
- Use the Chi-squared statistic on the contingency table of the two clustering results (or any other measure of association that can be derived from a contingency table).