Hello I am trying to calculate correlation coefficient, and I am trying to write a script but it gives me syntax error.
Basically I have some data and I want to see what is the correlation between these data I have.
But I am encountering some python syntax error that I cannot figure out how to fix it.
My code looks like this:
%matplotlib inline import numpy as np import pandas as pd import matplotlib.pyplot as plt plt.rcParams['figure.figsize'] = (20.0, 10.0) #READING data data = pd.read_csv ('benchmarking.csv') print (data.shape) data.head() #Collecting X and Y X = data['logAUC'].values Y = data['RMSD'].values #Mean X and Y mean_x = np.mean(X) mean_y = np.mean(Y) print (mean_x, mean_y) #Total number of values n = len(X) # Using the formula to calculate b1 and b2 numer = 0 denom = 0 for i in range(m): numer += (X[i] - mean_x * (Y[i] - mean_y) denom += (X[i] - mean_x) ** 2 b1 = numer/denom b0 = mean_y - (b1 * mean_x) print (b1, b0)
This is the error I get:
denom += (X[i] - mean_x) ** 2 ^ SyntaxError: invalid syntax
My input data looks like this:
Protein name logAUC RMSD 0 Metaloellastase 47.96 0.61 1 FGF1 23.44 0.72 2 FKBP1A 38.98 1.16 3 UDP 15.45 0.58 4 MDM2 18.91 1.42