Creating Correlation Plot using distance method from a data.frame
2
0
Entering edit mode
3.1 years ago

I would like to create a correlation plot containing the correlation coefficients and the P values. But my correlation output is data. frame and not a matrix. How do I create a correlation matrix from scratch manually without cor test. Because the method of my correlation is the "distance method" which is not available in the cor test. Hence, I used the package called "correlation" that gives me the Coefficient values and the P-values except they are in a table and I cannot create a corr plot from them. I used the function matrix.data and as.matrix , both give me the error - The matrix is not in [-1, 1]! . Could someone help in creating a corr plot using the distance method?

R correlation corrplot • 1.5k views
ADD COMMENT
0
Entering edit mode

Can you add the data frame to your post? dput(df)

ADD REPLY
0
Entering edit mode

Unfortunately, I am not allowed to share the data frame in a public forum. But I will attach the image of how the data output looks like.The data frame when opened looks like this with the coefficient values(r) as numeric and P values as numeric too enter image description here

ADD REPLY
0
Entering edit mode
3.1 years ago
Mensur Dlakic ★ 27k

There is a simple function in pandas to calculate column correlations, and then another matplotlib function that will make a plot out of it. I suspect there must be something similar in R as well.

Most of the code below is used for creating 5 random data columns with 100 points each. Since none of them would be highly correlated, I made columns f2 and f5 artificially similar to each other.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sbn

df = pd.DataFrame(np.random.RandomState(101).rand(100, 5), columns=['f1','f2','f3','f4','f5'])
df['f2'] = df[['f5']].applymap(lambda x: x + np.random.uniform(-0.5, 0.5))
corr = df.corr()
print(corr)
corr.style.background_gradient(cmap='coolwarm').set_precision(2)
plt.figure(figsize=(8,8))
sbn.heatmap(corr, annot=True)
plt.tight_layout()
plt.show()

It prints out the correlations:

          f1        f2        f3        f4        f5
f1  1.000000 -0.029128 -0.125591 -0.048376  0.034170
f2 -0.029128  1.000000  0.143715 -0.167495  0.738201
f3 -0.125591  0.143715  1.000000 -0.068780  0.137218
f4 -0.048376 -0.167495 -0.068780  1.000000 -0.116675
f5  0.034170  0.738201  0.137218 -0.116675  1.000000

And here is the plot:

enter image description here

ADD COMMENT
0
Entering edit mode

Thanks a lot for the code. If I might ask , does this corr function use the Pearson method in the given example ? Because I wanted to do the correlation with the "distance method".

ADD REPLY
0
Entering edit mode
3.1 years ago
Mensur Dlakic ★ 27k

You will need to install the dcor package to calculate distance correlations. The code below shows you how to calculate any function in a symmetric matrix, and present it as a heatmap.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sbn
import dcor

df = pd.DataFrame(np.random.RandomState(101).rand(100, 5), columns=['f1','f2','f3','f4','f5'])
df['f2'] = df[['f5']].applymap(lambda x: x + np.random.uniform(-0.5, 0.5))

# Distance correlation
dfcols = pd.DataFrame(columns=df.columns)
dcorr = dfcols.transpose().join(dfcols, how='outer')
for r in df.columns:
    for c in df.columns:
        dcorr[r][c] = dcor.distance_correlation(df[r], df[c])

corr = pd.DataFrame(dcorr.values, index=dcorr.index, columns=dcorr.columns).astype(np.float32)
print(corr)
corr.style.background_gradient(cmap='coolwarm').set_precision(2)
plt.figure(figsize=(8,8))
sbn.heatmap(corr, annot=True)
plt.tight_layout()
plt.show()
ADD COMMENT

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6