Question: How to compute Spearman correlation between numeric and non-numeric data variables ?
0
6 months ago by
prithvi.mastermind10 wrote:

I am working on correlation analysis and stuck on How to calculate the correlation between numerical data and non-numerical data? From the sample data given below, as usual, correlation is computed between two numerical data For e.g. between mRNA expression of myc and expression of P53. But how to calculate the correlation between a numeric and a non-numeric data For e.g between mRNA expression and mutation status of TP53. IS there any method or package in R that would compute the value of correlation 'r' with P values. Any help would be highly appreciated.

Sample MYC exp TP53 exp TP53 status 1 11.68583 6.338739 WT 2 9.668901 6.192507 MUT 3 8.080415 6.404516 WT 4 8.296929 6.869241 WT 5 9.446335 6.337951 MUT 6 7.958141 5.419711 MUT 7 7.423971 5.992706 WT 8 7.394608 6.16542 WT 9 10.97504 6.220372 MUT 10 5.756091 6.411477 WT

gene R genome • 555 views
modified 6 months ago by dsull1.8k • written 6 months ago by prithvi.mastermind10
0
6 months ago by
dsull1.8k
UCLA
dsull1.8k wrote:

When you have numerical variables, use cor.test to get the pearson r correlation and the p-values.

When you have your WT/MUT categories, all you have to do is set WT = 0 and MUT = 1, and then calculate the pearson r correlation (some people call this point-biserial correlation) and get p-values like you'd normally do. The p-values are derived from the t-statistic (you're technically performing a t-test).

Personally, I don't find this correlation metric that informative. It's much more informative to look at the mean of WT group and the mean of MUT group. (From there, you can see how different the WT and MUT means are, or report a fold change between WT and MUT, etc.)

EDIT: Misread the title -- if you want to do spearman's, not possible.

Thanks dsull for your valuable suggestion.

I'm using graph pad prism for correlation analysis. So is it necessary to calculate the p-values by applying t-test? Or the p-values are automatically generated through Pearson correlation are good to go?

The p-values should be good to go and, in fact, should be identical to the p-values you get from running a two-sample Student's t-test.