Your data looks like this:

```
df
Compound conc Count conc.pred
1 A 3.720e-03 10200000 565.1263
2 A 1.368e-02 10300000 559.3499
3 A 5.004e-02 10200000 565.1263
4 A 1.836e-01 10200000 565.1263
5 A 6.732e-01 9927000 580.8959
6 A 2.472e+00 9400000 611.3376
7 A 9.048e+00 8464040 665.4025
8 A 3.324e+01 6513840 778.0542
9 A 1.212e+02 4349320 903.0858
10 A 4.464e+02 3071480 976.8991
11 A 1.632e+03 5989320 808.3526
12 A 6.000e+03 8444560 666.5278
```

What are you aiming to model? You cannot do anything with the compound column because it only has one value.

The only thing that we can do here is build a linear regression model between `conc`

and `Count`

:

```
model <- lm(conc ~ Count, data = df)
summary(model)
Call:
lm(formula = conc ~ Count, data = df)
Residuals:
Min 1Q Median 3Q Max
-781.9 -620.7 -565.1 -552.1 5333.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.154e+03 1.836e+03 0.629 0.544
Count -5.776e-05 2.176e-04 -0.265 0.796
Residual standard error: 1816 on 10 degrees of freedom
Multiple R-squared: 0.006999, Adjusted R-squared: -0.0923
F-statistic: 0.07049 on 1 and 10 DF, p-value: 0.796
```

It looks like there is no statistically significant relationship between these.

We can assess R^2 shrinkage using 3-fold cross-validation:

```
library(bootstrap)
theta.fit <- function(x, y) { lsfit(x,y) }
theta.predict <- function(fit, x) { cbind(1,x) %*% fit$coef }
x <- as.matrix(df[,which(colnames(df) %in% names(summary(model)$coefficients[,1]))])
y <- as.matrix(df[,c("conc")])
results <- crossval(x, y, theta.fit, theta.predict, ngroup = 3)
# Raw R^2
cor(y, model$fitted.values)**2
[,1]
[1,] 0.006999459
# cross-validated R^2
cor(y, results$cv.fit)**2
[,1]
[1,] 0.07150013
```

It does not look like a good model, in fairness.

We can also plot the model-predicted `conc`

values versus the real ones:

```
require(ggplot2)
df$conc.pred = predict(model)
ggplot(df, aes(x = conc, y = conc.pred)) +
geom_point() +
geom_smooth(method='lm',formula=y~x) +
stat_smooth(method="lm", fullrange=TRUE) +
ggtitle("My linear model")
```

The confidence intervals (grey shade) are extraordinary.

Finally, we can plot the ROC curve:

```
require(pROC)
roc <- roc(df$conc, fitted(model), smooth=FALSE)
plot.roc(
roc,
grid=TRUE,
grid.lwd=2,
col="royalblue",
main="My linear model")
text(0.3, 0.45,
col="royalblue",
paste("AUC (L95, AUC, U95):",
paste(round(ci.auc(roc)[1:3], 3), sep=" ", collapse=","), sep=" "),
cex=1.0)
```

Hopefully this can 'foment' some ideas for you.

If you have other compounds in your data and want to model them as the y variables in the model, then you can switch to using a generalised linear model (`glm()`

) in place of the linear model (`lm()`

).

Kevin

@Kevin Blighe Thanks Kevin, I thought that the concentration-response curves is based on the percent-viability scores which is often fit using smooth cubic splines for multivariate data . what I am mainly looking for is to calculate the AUC which should be only a value, right?

I wrote and re-wrote the response to your comment a few times and decided on this: you should have added much more information to your original question (or perhaps I should have requested it). In my very brief example that I give (above), if the model assumptions were correct, then you would have a single value for AUC.

You are talking about multivariate data, but what do you mean? You just have 3 columns in your data, and the

`Compound`

column is just all 'A' values.@Kevin Blighe, So, let me give more information and see your comments, I will absolutely like and accept your answer even if it does not lead me to a correct answer. So, lets assume that I have a cell line which I treat with a compound in several concentration and I obtain the sensitivity values. If I have a replicate, should I treat them as a multivariate calibration or just do it over each replicate separately?

There is no right or wrong, I think. You can try them separate and then together (in a multivariate analysis). In the multivariate analysis, you could probably add

`cell-line`

as a covariate to see if there are cell-line specific effects in relation to drug sensitivity, too.@Kevin Blighe could you please use the following data in your example? I am just wondering how you could do that

Hey, can you not simply repeat my same code with your new data?

@Kevin Blighe do you know the cell to cell interaction

@Kevin Blighe you wrote me this

`to see if there are cell-line specific effects in relation to drug sensitivity`

actually I have been reading a lot about this. no one uses lm for such a analysis. the worse one can do is liner because basically it is not linear but nonlinear and sigmoidal ! however, thanks for all your time