Question

Finding speaman's rho correlation matrix

0

Entering edit mode

4.6 years ago

Natasha ▴ 40

Hi All,

This is a follow-up to my previous post here

I intend to cluster tissues based on gene expression levels. I am trying to replicate figure 1 of this paper

Based on the inputs given in my previous post, the input data has been converted to the following format using categorical information of gene expression levels for more than 1000 genes. I have presented the data with two columns of ensembl gene id's for the purpose of illustration.

                  ENSG00000000003 ENSG00000000419 ....
adrenal gland            1.000000        4.000000 ...
appendix                 2.000000        3.500000 ...
bone marrow              1.000000        3.000000 ...
breast                   2.000000        3.000000 ...
bronchus                 4.000000        3.000000 ...
caudate                  1.000000        2.500000 ...

From the above data, I'd like to compute the spearman's rho correlation matrix and convert it to a distance measure for clustering.

Could someone explain how spearman's rho correlation has to be computed ? (I looked at in-built functions in R suggested in my previous post. However, I would like to understand how it is computed)

gene-expression tissue correlation spearman • 973 views

ADD COMMENT • link 4.6 years ago by Natasha ▴ 40

score 0 · Answer 1 · 2019-09-04

0

Entering edit mode

4.6 years ago

Alex Reynolds 35k

There's a great explanation of how it is calculated here: https://en.m.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient

ADD COMMENT • link 4.6 years ago by Alex Reynolds 35k

0

Entering edit mode

Many thanks for the link. I read thorough the explanation . I'd like to ask for clarifications on how to interpret the computation of correlation matrix

The following is the sample data that is considered

df


                adrenal gland appendix bone marrow   breast bronchus
ENSG00000000003             1      2.0           1 2.000000        4
ENSG00000000419             4      3.5           3 3.000000        3
ENSG00000000457             1      1.5           2 2.666667        1
ENSG00000000460             3      1.5           2 3.000000        3

Using corr <- cor(df,method = "spearman")

the following output is obtained

              adrenal gland   appendix bone marrow      breast   bronchus
adrenal gland     1.0000000 0.50000000   0.8333333  0.88888889  0.0000000
appendix          0.5000000 1.00000000   0.3333333  0.05555556  0.5000000
bone marrow       0.8333333 0.33333333   1.0000000  0.83333333 -0.5000000
breast            0.8888889 0.05555556   0.8333333  1.00000000 -0.3333333
bronchus          0.0000000 0.50000000  -0.5000000 -0.33333333  1.0000000

From what I understand the above matrix is constructed using df^T(transpose)*df which gives a tissue x tissue correlation matrix with variances on the diagonals and covariance on the non-diagonal entries. Could you please explain how this matrix can be interpreted?

ADD REPLY • link 4.6 years ago by Natasha ▴ 40

0

Entering edit mode

Also, in the above-mentioned link a formula is mentioned when all the ranks are distinct. Could you please explain how to assign ranks when the values of a variable is not distinct (e.g data stored in df)?

ADD REPLY • link 4.6 years ago by Natasha ▴ 40