Question

High correlation between two genes or more genes indicative of similar localization?

0

Entering edit mode

7.0 years ago

chinsanity • 0

I have a two-part question:I am conducting the the analysis of mRNA expression data. I am using the TCGA datasets to check Pearson's correlation between my gene of interest and the other available ones.

1) I am using the median z-scores calculated by the TCGA pipeline to check correlation between genes. Would it be useful to utilise raw expression data instead of z-scores to calculate correlation?

2) Does high correlation(>0.6) indicate that the genes in question have a similar role or are involved in the same biochemical process? The exact function of my gene of interest isn't established yet, so I am trying to atleast guess where it could potentially be involved.

Regards

Correlation Expression • 2.1k views

ADD COMMENT • link updated 7.0 years ago by Lars Juhl Jensen 11k • written 7.0 years ago by chinsanity • 0

0

Entering edit mode

I would not be very confident to say anything < 0.8 a high correlation.

ADD REPLY • link 7.0 years ago by Santosh Anand 5.7k

score 1 · Answer 1 · 2017-04-24

If the z-scores were calculated separately for each gene by simply subtracting the mean and dividing by the standard deviation, it should not make any difference at all for Pearson's correlation. By that I mean that the results should be identical, because Pearson's correlation is invariant to offset and scale.

My general experience with coexpression is that it is quite weak evidence for functional associations. However, that obviously depends a lot on the organism, the expression conditions, and the function in question.

I agree with the comment that 0.6 is not a high correlation, but I will not provide another arbitrary cutoff instead. What constitutes a high correlation depends on all the factors listed above as well as on the number of samples across which you calculate the correlation, and how well those samples are normalized to each other (poor normalization can lead to artificially high correlations).