2.2 years ago by

UK, U. Glasgow

Just so you are aware:
An assumption of ANOVA is that the standard deviations are identical in each category.
The t-test that is used by R does not, by default, assume identical standard deviations in the two categories, although in text-books this is a common assumption. By setting `var.equal=TRUE`

in the t.test code, you can recover the same p-values as obtained from an ANOVA implementation

```
# Example using unbalanced data
set.seed(1); library(magrittr)
x <- c(rep('a', 15), rep('b', 5)) %>% factor
y <- rnorm(20)
# t-Test with default settings: ie, equal sd for each group is not assumed (var.equal = FALSE)
t.test(formula = y ~ x)
# Welch Two Sample t-test
#
# data: y by x
# t = -1.0707, df = 15.609, p-value = **0.3006**
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -1.0704444 0.3529962
# sample estimates:
# mean in group a mean in group b
# 0.1008428 0.4595670
## t-test assuming sds are identical
t.test(formula = y ~ x, var.equal = TRUE)
# Two Sample t-test
#
# data: y by x
# t = -0.7519, df = 18, p-value = **0.4618** ## <<<<--- p values differ between the t-tests
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -1.3610547 0.6436064
# sample estimates:
# mean in group a mean in group b
# 0.1008428 0.4595670
## ANOVA
lm(y ~ x) %>% summary
# Call:
# lm(formula = y ~ x)
#
# Residuals:
# Min 1Q Median 3Q Max
# -2.3155 -0.5589 0.1815 0.4773 1.4944
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.1008 0.2385 0.423 0.677
# xb 0.3587 0.4771 0.752 0.462
#
# Residual standard error: 0.9239 on 18 degrees of freedom
# Multiple R-squared: 0.03045, Adjusted R-squared: -0.02341
# F-statistic: 0.5654 on 1 and 18 DF, p-value: **0.4618** ## <<<-- p-value matches that obtained from equal-variance assumption t-test
```

So, although the textbooks may tell you that the t-test is equivalent to one-way/two-group ANOVA, that is only really true if you assume that the variances are equal in the two groups. And, in particular, whenever you use a statistical test in a computational package, it's *really* valuable to know what implementation of a test you are actually using.

ps, I think this question fits happily on biostars

•

link
modified 2.2 years ago
•
written
2.2 years ago by
russhh • **4.1k**
Could you post the code that you used, please? Also, could you state whether your experiment is balanced, that is, is there the same number of samples for category 0 as for category 1?

4.1kCode is so simply:

d<-read.delim("mydata.txt"); attach(d); d1<-subset(d[,"score"], category == 0); d2<-subset(d[,"score"], category == 1);

## t.test

t.test(d1, d2, var.equal=T);

## glm

summary(glm(score ~ category));

With "var.equal=F", t.test & glm gave different p-values. With "var.equal=T" they yielded the same p-value.

My experiment is not balanced and not paired.

350If the y variable is only 0|1, it would be more appropriate to do a logistic regression, e.g. summary(glm(y~x, family='binomial')). This will also give you an odds-ratio, an estimate of how much an increase in x corresponds to higher/lower odds of getting y==0.

In general I think the advantages of using a regression over a t-test are two: 1) you get an odds-ratio apart from a p-value 2) you can easily add more factors in if there are other variables.

26kYeah, good points. thanks.

350