Edit: 3rd December 2018
Edit: 27th December 2020
The original solution using a for
loop is below. Here is a simpler / quicker solution using lapply
:
# create test data
df <- data.frame(
Treatment=c("A","A","B","B"),
Biochem_1=c(2.33, 3.21, 1.21, 2.11),
Biochem_2=c(0.42, 0.56, 1.34, 0.99),
Biochem_3=c(3,2,1,1))
df
Treatment Biochem_1 Biochem_2 Biochem_3
1 A 2.33 0.42 3
2 A 3.21 0.56 2
3 B 1.21 1.34 1
4 B 2.11 0.99 1
set Treatment
as factor
df$Treatment <- factor(df$Treatment, levels=c("A","B"))
store all formulae in a list
formulae <- lapply(colnames(df)[2:ncol(df)], function(x) as.formula(paste0(x, " ~ Treatment")))
go through list and run aov()
res <- lapply(formulae, function(x) summary(aov(x, data = df)))
names(res) <- format(formulae)
res
$`Biochem_1 ~ Treatment`
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 1 1.2321 1.2321 3.111 0.22
Residuals 2 0.7922 0.3961
$`Biochem_2 ~ Treatment`
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 1 0.4556 0.4556 12.82 0.0699 .
Residuals 2 0.0711 0.0355
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
$`Biochem_3 ~ Treatment`
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 1 2.25 2.25 9 0.0955 .
Residuals 2 0.50 0.25
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
extract just p-values
p <- unlist(lapply(res, function(x) x[[1]]$"Pr(>F)"[1]))
p
Biochem_1 ~ Treatment Biochem_2 ~ Treatment Biochem_3 ~ Treatment
0.21983666 0.06989405 0.09546597
data.frame(
Biochemical = sub(' ~ Treatment', '', names(p)),
pvalue = p)
Biochemical pvalue
Biochem_1 ~ Treatment Biochem_1 0.21983666
Biochem_2 ~ Treatment Biochem_2 0.06989405
Biochem_3 ~ Treatment Biochem_3 0.09546597
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
Original answer:
Here you go, Sire:
data_name
Treatment Biochem_1 Biochem_2
1 A 2.33 0.42
2 A 3.21 0.56
3 B 1.21 1.34
4 B 2.11 0.99
#Ensure that category 'A' is the reference level (does not make much difference here)
data_name$Treatment <- factor(data_name$Treatment, levels=c("A","B"))
for (i in 2:ncol(data_name)) {
formula <- as.formula(paste(colnames(data_name)[i], " ~ Treatment", sep=""))
model <- aov(formula, data = data_name)
cat("\n-----\n\n")
cat(colnames(data_name)[i])
cat("\n")
print(summary(model))
}
-----
Biochem_1
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 1 1.2321 1.2321 3.111 0.22
Residuals 2 0.7922 0.3961
-----
Biochem_2
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 1 0.4556 0.4556 12.82 0.0699 .
Residuals 2 0.0710 0.0355
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
You can also wrap everything in sink()
commands so that all output goes directly to a file:
sink("Output.txt")
My Code
sink()
Also take a look at my code here to see how you could run many hundreds of thousands of models via parallel processing: R functions edited for parallel processing (see the section entitled Any type of LM, GLM, clogit (here: clogit))
Thank you! This was very helpful!
@Kevin Blighe
Is it possible that I can export a table with two columns: one for the observation and one for its calculated p-value?
Thanks,
Yes, sorry, I have modified the procedure to do this. Start from 'go through list and run aov()'
Thank you so much, The script works perfectly.
Additionally, is it possible to have the pair-wise posthoc test p-value after ANOVA? It will be really helpful to have this statistical test data in the same result table. My apology for this additional question.
Thanks,
like this?
Hi cpad0112,
Thank you so much for your reply. I would like to have all data in one table like this one (suppose to have 4 variables instead of 2 as the above example):
Thanks,
I realize it's been many years but hopefully someone can still answer my question:
Is there any way I can extract the residuals instead of the p values when using the lapply method? I've tried every way I can think of but it just keeps replying "NULL"
Thank you!