Question: A simple regression in R
0
gravatar for Learner
10 months ago by
Learner 200
Learner 200 wrote:

I am looking for a way to calculate AUC for a drug response . The data looks like the following

df<- structure(list(Compound = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = "A", class = "factor"), conc = c(0.00372, 
0.01368, 0.05004, 0.1836, 0.6732, 2.472, 9.048, 33.24, 121.2, 
446.4, 1632, 6000), Count = c(10200000, 10300000, 10200000, 10200000, 
9927000, 9400000, 8464040, 6513840, 4349320, 3071480, 5989320, 
8444560)), class = "data.frame", row.names = c(NA, -12L))

is there anyone who could help me

R • 521 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by Learner 200
4
gravatar for Kevin Blighe
10 months ago by
Kevin Blighe54k
Kevin Blighe54k wrote:

Your data looks like this:

df
   Compound      conc    Count conc.pred
1         A 3.720e-03 10200000  565.1263
2         A 1.368e-02 10300000  559.3499
3         A 5.004e-02 10200000  565.1263
4         A 1.836e-01 10200000  565.1263
5         A 6.732e-01  9927000  580.8959
6         A 2.472e+00  9400000  611.3376
7         A 9.048e+00  8464040  665.4025
8         A 3.324e+01  6513840  778.0542
9         A 1.212e+02  4349320  903.0858
10        A 4.464e+02  3071480  976.8991
11        A 1.632e+03  5989320  808.3526
12        A 6.000e+03  8444560  666.5278

What are you aiming to model? You cannot do anything with the compound column because it only has one value.

The only thing that we can do here is build a linear regression model between conc and Count:

model <- lm(conc ~ Count, data = df)
summary(model)

Call:
lm(formula = conc ~ Count, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-781.9 -620.7 -565.1 -552.1 5333.5 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.154e+03  1.836e+03   0.629    0.544
Count       -5.776e-05  2.176e-04  -0.265    0.796

Residual standard error: 1816 on 10 degrees of freedom
Multiple R-squared:  0.006999,  Adjusted R-squared:  -0.0923 
F-statistic: 0.07049 on 1 and 10 DF,  p-value: 0.796

It looks like there is no statistically significant relationship between these.

We can assess R^2 shrinkage using 3-fold cross-validation:

library(bootstrap)
theta.fit <- function(x, y) { lsfit(x,y) }
theta.predict <- function(fit, x) { cbind(1,x) %*% fit$coef }
x <- as.matrix(df[,which(colnames(df) %in% names(summary(model)$coefficients[,1]))])
y <- as.matrix(df[,c("conc")]) 
results <- crossval(x, y, theta.fit, theta.predict, ngroup = 3)

# Raw R^2 
cor(y, model$fitted.values)**2
            [,1]
[1,] 0.006999459

# cross-validated R^2
cor(y, results$cv.fit)**2
           [,1]
[1,] 0.07150013

It does not look like a good model, in fairness.

We can also plot the model-predicted conc values versus the real ones:

require(ggplot2)
df$conc.pred = predict(model)
ggplot(df, aes(x = conc, y = conc.pred)) +
  geom_point() +
  geom_smooth(method='lm',formula=y~x) +
  stat_smooth(method="lm", fullrange=TRUE) +
  ggtitle("My linear model")

dd

The confidence intervals (grey shade) are extraordinary.

Finally, we can plot the ROC curve:

require(pROC)
roc <- roc(df$conc, fitted(model), smooth=FALSE)
plot.roc(
  roc,
  grid=TRUE,
  grid.lwd=2,
  col="royalblue",
  main="My linear model")
text(0.3, 0.45,
  col="royalblue",
  paste("AUC (L95, AUC, U95):",
    paste(round(ci.auc(roc)[1:3], 3), sep=" ", collapse=","), sep=" "),
  cex=1.0)

Hopefully this can 'foment' some ideas for you.

If you have other compounds in your data and want to model them as the y variables in the model, then you can switch to using a generalised linear model (glm()) in place of the linear model (lm()).

Kevin

ADD COMMENTlink modified 10 months ago • written 10 months ago by Kevin Blighe54k

@Kevin Blighe Thanks Kevin, I thought that the concentration-response curves is based on the percent-viability scores which is often fit using smooth cubic splines for multivariate data . what I am mainly looking for is to calculate the AUC which should be only a value, right?

ADD REPLYlink written 10 months ago by Learner 200
1

I wrote and re-wrote the response to your comment a few times and decided on this: you should have added much more information to your original question (or perhaps I should have requested it). In my very brief example that I give (above), if the model assumptions were correct, then you would have a single value for AUC.

You are talking about multivariate data, but what do you mean? You just have 3 columns in your data, and the Compound column is just all 'A' values.

ADD REPLYlink written 10 months ago by Kevin Blighe54k

@Kevin Blighe, So, let me give more information and see your comments, I will absolutely like and accept your answer even if it does not lead me to a correct answer. So, lets assume that I have a cell line which I treat with a compound in several concentration and I obtain the sensitivity values. If I have a replicate, should I treat them as a multivariate calibration or just do it over each replicate separately?

ADD REPLYlink written 10 months ago by Learner 200

There is no right or wrong, I think. You can try them separate and then together (in a multivariate analysis). In the multivariate analysis, you could probably add cell-line as a covariate to see if there are cell-line specific effects in relation to drug sensitivity, too.

ADD REPLYlink written 10 months ago by Kevin Blighe54k

@Kevin Blighe could you please use the following data in your example? I am just wondering how you could do that

df<-structure(list(Drug_name = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "DrugT", class = "factor"), 
        Cell_line = structure(c(2L, 3L, 1L, 4L, 5L), .Label = c("Celline3", 
        "Cellline1", "Cellline2", "Cellline4", "Cellline5"), class = "factor"), 
        Concentrations_nM = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "3.1e-05,0.000114,0.000417,0.00153,0.00561,0.0206,0.0754,0.277,1.01,3.72", class = "factor"), 
        Responses = structure(c(4L, 2L, 3L, 5L, 1L), .Label = c("1,0.948118156866697,0.920303987764611,1.03610743904536,1.08332987533419,0.960086785898477,0.765642506120658,0.572520170014998,0.375835106792894,0.254180720963181", 
        "1,0.953703703703704,1,0.972222222222222,0.972222222222222,0.911407407407407,0.835722222222222,0.651211111111111,0.454325925925926,0.293437037037037", 
        "1,0.97679357161345,0.923286430268948,0.933051365914492,0.938592672768771,0.834813760323772,0.738458551965929,0.56534271394621,0.428992446881103,0.287483234899645", 
        "1,1.02970297029703,1.01980198019802,1.00990099009901,1.03960396039604,0.945219801980198,0.887081188118812,0.657615841584158,0.454776237623762,0.319762376237624", 
        "1,1.03893565587121,0.967905086838,1.06062943976775,0.886153353792697,0.897337425206165,0.719798149513376,0.553963001697327,0.372291848829301,0.301689897760328"
        ), class = "factor")), class = "data.frame", row.names = c(NA, 
    -5L))
ADD REPLYlink modified 10 months ago by Kevin Blighe54k • written 10 months ago by Learner 200
3

Hey, can you not simply repeat my same code with your new data?

ADD REPLYlink written 10 months ago by Kevin Blighe54k

@Kevin Blighe do you know the cell to cell interaction

ADD REPLYlink written 10 months ago by Learner 200

@Kevin Blighe you wrote me this to see if there are cell-line specific effects in relation to drug sensitivity actually I have been reading a lot about this. no one uses lm for such a analysis. the worse one can do is liner because basically it is not linear but nonlinear and sigmoidal ! however, thanks for all your time

ADD REPLYlink modified 10 months ago • written 10 months ago by Learner 200

no one uses lm for such a analysis. the worse one can do is liner because basically it is not linear but nonlinear and sigmoidal !

Only a naïve person would rigidly adhere to this statement ^

ADD REPLYlink written 10 months ago by Kevin Blighe54k

@Kevin Blighe no offend, I am naive for sure and I am learning without any problem. I simply pointed out the issue I see with the built model. No offend whatsoever. I just wanted to inform you about my findings

ADD REPLYlink written 10 months ago by Learner 200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1668 users visited in the last hour