One-way ANOVA in R for many observations
2
1
Entering edit mode
3.7 years ago
darzenis ▴ 30

Hello,

I am trying to do one-way ANOVA in R to check for significant variations in biochemical concentrations between treatment groups.

This is how my data table is set up:

Treatment     Biochem_1     Biochem_2
A                   2.33                0.42
A                   3.21                0.56
B                   1.21                1.34
B                   2.11                0.99


I am able to run a simple code for one of the biochemicals (collumns):

biochem_1 <- aov(biochem_1 ~ Treatment, data = data_name)


And then get a summary of stat results:

summary(biochem_1)


The problem is that I have over a hundred biochemicals (collumns) in my data set and I do not want to run the same code and over again by manually changing the collumn name.

How could I write my code to run ANOVA for all the biochemicals?

Thank you!

R • 6.4k views
9
Entering edit mode
3.7 years ago

Edit: 3rd December 2018

Edit: 27th December 2020

The original solution using a for loop is below. Here is a simpler / quicker solution using lapply:

# create test data
df <- data.frame(
Treatment=c("A","A","B","B"),
Biochem_1=c(2.33, 3.21, 1.21, 2.11),
Biochem_2=c(0.42, 0.56, 1.34, 0.99),
Biochem_3=c(3,2,1,1))

df
Treatment Biochem_1 Biochem_2 Biochem_3
1         A      2.33      0.42         3
2         A      3.21      0.56         2
3         B      1.21      1.34         1
4         B      2.11      0.99         1


# set Treatment as factor

df$Treatment <- factor(df$Treatment, levels=c("A","B"))


# store all formulae in a list

formulae <- lapply(colnames(df)[2:ncol(df)], function(x) as.formula(paste0(x, " ~ Treatment")))


# go through list and run aov()

res <- lapply(formulae, function(x) summary(aov(x, data = df)))
names(res) <- format(formulae)
res
$Biochem_1 ~ Treatment Df Sum Sq Mean Sq F value Pr(>F) Treatment 1 1.2321 1.2321 3.111 0.22 Residuals 2 0.7922 0.3961$Biochem_2 ~ Treatment
Df Sum Sq Mean Sq F value Pr(>F)
Treatment    1 0.4556  0.4556   12.82 0.0699 .
Residuals    2 0.0711  0.0355
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

$Biochem_3 ~ Treatment Df Sum Sq Mean Sq F value Pr(>F) Treatment 1 2.25 2.25 9 0.0955 . Residuals 2 0.50 0.25 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  # extract just p-values p <- unlist(lapply(res, function(x) x[]$"Pr(>F)"))
p
Biochem_1 ~ Treatment Biochem_2 ~ Treatment Biochem_3 ~ Treatment
0.21983666            0.06989405            0.09546597

data.frame(
Biochemical = sub(' ~ Treatment', '', names(p)),
pvalue = p)
Biochemical     pvalue
Biochem_1 ~ Treatment   Biochem_1 0.21983666
Biochem_2 ~ Treatment   Biochem_2 0.06989405
Biochem_3 ~ Treatment   Biochem_3 0.09546597


## ----------------------------------------------------------------------------------

Here you go, Sire:

data_name
Treatment Biochem_1 Biochem_2
1         A      2.33      0.42
2         A      3.21      0.56
3         B      1.21      1.34
4         B      2.11      0.99

#Ensure that category 'A' is the reference level (does not make much difference here)
data_name$Treatment <- factor(data_name$Treatment, levels=c("A","B"))

for (i in 2:ncol(data_name)) {
formula <- as.formula(paste(colnames(data_name)[i], " ~ Treatment", sep=""))
model <- aov(formula, data = data_name)

cat("\n-----\n\n")
cat(colnames(data_name)[i])
cat("\n")
print(summary(model))
}

-----

Biochem_1
Df Sum Sq Mean Sq F value Pr(>F)
Treatment    1 1.2321  1.2321   3.111   0.22
Residuals    2 0.7922  0.3961

-----

Biochem_2
Df Sum Sq Mean Sq F value Pr(>F)
Treatment    1 0.4556  0.4556   12.82 0.0699 .
Residuals    2 0.0710  0.0355
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


You can also wrap everything in sink() commands so that all output goes directly to a file:

sink("Output.txt")
My Code
sink()

0
Entering edit mode

Also take a look at my code here to see how you could run many hundreds of thousands of models via parallel processing: R functions edited for parallel processing (see the section entitled Any type of LM, GLM, clogit (here: clogit))

1
Entering edit mode

Thank you! This was very helpful!

0
Entering edit mode

@Kevin Blighe

Is it possible that I can export a table with two columns: one for the observation and one for its calculated p-value?

Thanks,

0
Entering edit mode

Yes, sorry, I have modified the procedure to do this. Start from 'go through list and run aov()'

1
Entering edit mode

Thank you so much, The script works perfectly.

Additionally, is it possible to have the pair-wise posthoc test p-value after ANOVA? It will be really helpful to have this statistical test data in the same result table. My apology for this additional question.

Thanks,

0
Entering edit mode

like this?

condition data             model      term      contrast null.value estimate conf.low conf.high adj.p.value
<chr>     <list>           <list>     <chr>     <chr>         <dbl>    <dbl>    <dbl>     <dbl>       <dbl>
1 Biochem_1 <tibble [4 × 2]> <TukeyHSD> Treatment B-A               0   -1.11    -3.82      1.60       0.220
2 Biochem_2 <tibble [4 × 2]> <TukeyHSD> Treatment B-A               0    0.675   -0.135     1.49       0.0698
3 Biochem_3 <tibble [4 × 2]> <TukeyHSD> Treatment B-A               0   -1.5     -3.65      0.649      0.0954

0
Entering edit mode

Thank you so much for your reply. I would like to have all data in one table like this one (suppose to have 4 variables instead of 2 as the above example):

          pvalue_anova value_constrast_A_vs_B pvalue_constrast_A_vs_C pvalue_constrast_A_vs_D
biochem_1            1                      2                       3                       4
biochem_2            8                      9                      10                      11
biochem_3           15                     16                      17                      18
pvalue_constrast_B_vs_C pvalue_constrast_B_vs_D pvalue_constrast_C_vs_D
biochem_1                       5                       6                       7
biochem_2                      12                      13                      14
biochem_3                      19                      20                      21


Thanks,

1
Entering edit mode
3.7 years ago
brandontsinn ▴ 10

Here is a link to a software carpentry workshop on loops (and their faster alternatives) in R, that I think you might find to be useful.