Entering edit mode
2.7 years ago
Chironex
▴
50
Hi, I was trying to calculate distance matrix from my dataset to generate aMultidimensional Scaling Plot. However, values generated from dist() function when I try to calculate the distance matrix are too high. There is a setting to modify it?
d <- dist( data, diag = TRUE, upper = TRUE )
d
A_1 A_2 B_1 B_2 C_1 C_2 D_1 D_2 E_1 E_2
A_1 0.0000 68539.5393 33573.1135 50894.2207 70578.2982 20347.0745 70435.1249 38158.1997 59292.5268 69776.5004
A_2 68539.5393 0.0000 36886.1659 19122.1184 2111.8398 51918.3046 1978.4474 31921.6694 9647.1810 1376.4292
B_1 33573.1135 36886.1659 0.0000 18568.7180 38890.6770 18611.7210 38722.0872 7575.4019 27989.3274 38108.9009
B_2 50894.2207 19122.1184 18568.7180 0.0000 21092.5528 34135.1095 20920.8795 13725.5401 11183.2031 20331.0117
C_1 70578.2982 2111.8398 38890.6770 21092.5528 0.0000 53940.1809 367.0356 33946.4947 11618.8036 931.8314
C_2 20347.0745 51918.3046 18611.7210 34135.1095 53940.1809 0.0000 53801.9337 21530.5768 42965.9619 53150.4690
D_1 70435.1249 1978.4474 38722.0872 20920.8795 367.0356 53801.9337 0.0000 33780.6044 11486.1160 859.4139
D_2 38158.1997 31921.6694 7575.4019 13725.5401 33946.4947 21530.5768 33780.6044 0.0000 23204.5176 33160.8060
E_1 59292.5268 9647.1810 27989.3274 11183.2031 11618.8036 42965.9619 11486.1160 23204.5176 0.0000 10776.8957
E_2 69776.5004 1376.4292 38108.9009 20331.0117 931.8314 53150.4690 859.4139 33160.8060 10776.8957 0.0000
> mds <- cmdscale(d, k = 2, add = F)
> mds
[,1] [,2]
A_1 47358.298 5817.746
A_2 -20986.848 1126.265
B_1 15249.608 -3117.375
B_2 -2668.105 -3339.063
C_1 -23030.303 1128.455
C_2 30486.758 -2477.329
D_1 -22883.553 1029.804
D_2 10449.603 -3499.095
E_1 -11744.748 2139.385
E_2 -22230.711 1191.206
Thanks
Why are these values considered to high to plot? Regardless, it's the relationship between points that is important not the numeric value. And since it's a metric scale for
cmdscaleyou could divide all values in the output by 10000 and it should still be a valid visual representation of your data when plotted.If you are really concerned about the values, you could also try other ordination analyses using the
veganlibrary in R. You can implement a nonmetric MDS (metaMDS) where the numeric axes scales are meaningless.As an aside, why are there duplicates of each number in your matrix?
Hi, Im sorry, there are not duplicate names. I substituted with letters to better understand. ( 1 and 2 are different days, just to show here).
The problem arises from the beginning I think, because I have aggregate singlecell rna seq samples in pseudobulks, creating this matrix:
I have more than 25000 genes, here I only show to u ten. As u can see, expression profile in each sample is different, because the number of cells for each sample is different:
NUMBER OF CELLS:
I need so to normalize them. Can u suggest a way to do it prior to calculate distance matrix?