Question: Help with statistics for treatment control analysis using R
0
Gene-ticks0 wrote:

Hi All, I have cancer and WT controls samples from two different groups for size comparison. I would like to do some statistics to calculate p-value and perhaps get some plots. I am not very familiar with statistical analysis and was pondering if someone could teach me how to analyze this type of data. Thanks for your time. my data:

``````df <- structure(list(Group = c(1L, 1L, 1L, 2L, 2L, 2L, 2L), cancer = c(0.7,
0.7, 0.6, 0.65, 1, 0.75, 0.3), WTcontrol = c(1.1, 0.8, 0.7, 1.4,
1, 1, 1.05)), .Names = c("Group", "cancer", "WTcontrol"
), class = "data.frame", row.names = c(NA, -7L))
``````
R • 562 views
modified 20 months ago by manuel.belmadani1.2k • written 20 months ago by Gene-ticks0
4

What is your question, and what are you trying to analyze? I see a Group vector, a cancer vector and a control vector. Are you trying to identify if the values in the cancer vector are statistically different from the values in the control? Are each vectors supposed to be divided by "Group", as in a pre-drug treatment is Group 1 and after drug treatment is Group2? Are the samples paired between WT and cancer?

Those are important questions for analysis. I can probably suggested some tests and methods for you to look at, but you should provide more information about what you want to analyze.

There's a good chance what you want to do it a t-test, so I would take a look at: https://uc-r.github.io/t_test
and http://rstudio-pubs-static.s3.amazonaws.com/332835_b96d3bd2ce4b416f9ebfa8d7664e8e13.html

R has a lot of built-in plotting functions but I would recommend investing some time in learning ggplot2: https://ggplot2.tidyverse.org/ There's a learning curve but the pay-off of learning ggplot2 well is huge.

Actually, Group 1 and Group 2 are the experiments performed at two time points. The measurement is the diameter of the lesions. Yes, the samples are paired.

1

So since your samples are paired, I assume you want to know if Group2 is any different from Group1 when comparing the cancer sample with the WT sample. I have no idea if that makes sense experimentally (which is why I suggest you come up with a clear answer to what is it that you want to analyze? What are you trying to quantify?), but you could compute a ratio of lesion between matched samples, and do a linear model to see if the ratio is affected by the groups.

``````> df\$ratio = df\$cancer / df\$WTcontrol
> df\$Group
 1 1 1 2 2 2 2
Levels: 1 2
> summary( lm(formula = ratio ~ Group, data = df) )

Call:
lm(formula = ratio ~ Group, data = df)

Residuals:
1        2        3        4        5        6        7
-0.15314  0.08550  0.06764 -0.16071  0.37500  0.12500 -0.33929

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.7895     0.1489   5.303  0.00319 **
Group2       -0.1645     0.1970  -0.835  0.44168
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2579 on 5 degrees of freedom
Multiple R-squared:  0.1224,    Adjusted R-squared:  -0.05309
F-statistic: 0.6975 on 1 and 5 DF,  p-value: 0.4417
``````

If you look under coefficients for "Group2", you'll see an "estimate" (effect size of Group2 versus Group1), a t-value and a p-value. It seems like the difference is not significant (p=0.44168), when analyzed this way at least.

You should read more on linear regression in R.

``````library(ggplot2)
ggplot(df) +
geom_point(mapping=aes(x=cancer, y=WTcontrol, color=Group), size=5)
`````` Visually it doesn't look like there's any correlation between Cancer and WT (i.e. control always more or less the same at time point 2 except for one outlier while cancer varies at lot. Group 1 seems a bit more consistent between the two pairs but with only 3 data points it's hard to say much about it.)