Question: Multidimentional Scaling Analysis
4.4 years ago
User000 wrote:

I am analysing Genotyping-by-sequencing data. I have .vcf file as a result. I have 17 individuals with different SNP loci and I have filtered and removed SNP loci with more than 90% missing data. I was wondering if it makes sense to do a MDS analysis to see the distances between these 17 individuals using PLINK, also using the same program I have obtained IBS, and produced a dendrogram using R. Does this make sense? Any suggestions comments or giudeline is appreciated.

P.S. 17 individuals appertain to the same specie but they have been grown in different places.

A bit off-topic, but I bet that as SNPs are quite sparse in 17 individuals, some more sophisticated clustering (that compensates the missing information by using known gene-gene relationships) is needed. I would recommend to check out this paper ( that utilizes gene networks to perform a robust clustering of SNP profiles.

4.4 years ago
tommivat wrote:

This is a very common visualization problem, where the most important part is to define a proper similarity function to obtain similarity matrix for your individuals. By quick google, I found this paper which may give you some starting points. After you have generated a similarity matrix (you might already have a good one) you can use any visualization tool which takes this matrix as an input. My current favourite is t-SNE which has turned out to be very good for many complex data. If you are satisfied with the MDS result, there is no reason to try any other (possible more advanced) techniques, though.

thanks,I am having a look at the paper. I created a matrix using plink, there is a command line that does all the job, and visualized using R, not sure If I can trust this analysis though...

4.4 years ago
Devon Ryan
Freiburg, Germany
Devon Ryan wrote:

It'll never hurt anything to do an MDS, in fact it can only be neutral or helpful.

In literature MDS is used to assign many individuals to several groups, while I have only 17 individuals with many SNP loci, so basically I can observe their distance and relation on the basis of SNP?

It'll depend on the population structure. You might give it a go in any case. Also do have a look at the paper mikhail.shugay linked to.

