Question

Kernel Ridge Regression in R for Drug-Target Interaction based on Kronecker Products computed from Tanimoto Kernels and Smith-Waterman Scores

0

Entering edit mode

9.2 years ago

' ▴ 330

I want to run Kernel Ridge Regression on a set of kernels I have computed, but I do not know how to do this in R. I found the constructKRRLearner function from CVST package, but the manual is not clear at all, especially for me being a complete beginner in Machine Learning. The function needs and x and y, but I have no idea what to input there, as I only have a data frame that has the pairwise kernel computed as kronecker product between drugs and proteins.

How can I do a Kernel Ridge Regression task in R?

Ideally I also want to visualize my data points and then illustrate the regression line on the plot! For instance like this:

http://scikit-learn.org/stable/_images/plot_kernel_ridge_regression_0011.png

MORE INFO ON MY DATASET

I have a drug-target interactions (DTI) data set. The data set comprises of 100 drug compounds (rows) and 100 protein kinase targets (columns). there are some NAN's (missing values) in this data set. Values in this data set reflect how tightly a compound binds to a target.

I have drugs' SMILES and CHEMBL IDs.

I have the protein's (targets) sequences and UNIPROT IDs.

For drugs [100 drugs]: I converted drug SMILES to SDFset, and then I computed the fingerprints for each drug using OpenBabel. Based on these fingerprints I computed Tanimoto kernels for all possible combinations between drugs. (using "fpSim" function), e.g. Drug 1 with Drug 2, 3, 4, ... 10. Then Drug 2 with Drug 1, 3, 4... 100 and so on until Drug 99 with Drug 100. I named this BASE_DRUG_KERNELS

For proteins: I had the protein sequences, so I computed Smith-Waterman scores for all combination of protein pairs; e.g. Protein 1 with Protein 2, 3, ... 100, then Protein 2 with Protein 1, 3, 4, ... 100 and so on until Protein 99 with Protein 100. I named this BASE_PROTEIN_KERNELS

Then I computed the Kronecker between BASE_DRUG_KERNELS and BASE_PROTEIN_KERNELS which gave me a matrix of 100,000,000 elements. I named this matrix KRONECKER_PRODUCTS

I wish to run Kernel Ridge Regression on the matrix KRONECKER_PRODUCTS.

R machine learning kernel regression • 3.9k views

ADD COMMENT • link updated 9.2 years ago by Jean-Karim Heriche 27k • written 9.2 years ago by ' ▴ 330

score 2 · Accepted Answer · 2016-04-23

2

Entering edit mode

9.2 years ago

Jean-Karim Heriche 27k

In kernel ridge regression, the parameters are α=(K+λI)^(−1)y where K is the kernel matrix so you can get them with alpha <- solve(K,y)

ADD COMMENT • link 9.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks a lot for your answer. I am quite lost, so I have to ask one more question, do you mean that I basically have to abandon any R functions and simply just find the kernel ridge regression with the code you provided? Moreover, what does "y" refer to? Sorry for the elementary questions.

ADD REPLY • link 9.2 years ago by ' ▴ 330

1

Entering edit mode

y is your target vector or matrix e.g. response variables or classes of your samples that you're trying to model just as in linear regression.

EDIT: Forgot to answer the part about abandoning R: The α parameters are obtained as the (pseudo)inverse of the kernel matrix. You can use any linear algebra library for this. solve() is an R function that you can use for this purpose or you can use the ginv() function from the MASS package or get the (pseudo)inverse from the SVD.

ADD REPLY • link 9.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Great! Then would it still make sense to alternatively find the alpha value by simply just writing the original formula in R, i.e. alpha <- (K + lambda*I)^ -1 * y. Would this be as correct as alpha <- solve(K,y)?

ADD REPLY • link 9.2 years ago by ' ▴ 330

1

Entering edit mode

This is the same thing except you'd need to optimize for lambda which in the end would give you the same solution (the purpose of lambda is just to make the matrix invertible). solve() finds x as solution to the equation y = Zx, i.e. Z^-1y.

ADD REPLY • link 9.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Fantastic. Thank you once again for your most-detailed response. I'm really stuck with it for weeks, so I really appreciate your input very much. One last thing that remains is that, can the prediction function g(x) for the actual KRR prediction task also be simply implemented in the same in R? Or that requires more advanced programming? I am referring to this function:

Image file - click to view

ADD REPLY • link 9.2 years ago by ' ▴ 330

1

Entering edit mode

To see what to do, put the equation into words: If we call K() a similarity function, the prediction for x is the weighted sum of the similarities of x to the training set elements.

ADD REPLY • link 9.2 years ago by Jean-Karim Heriche 27k