Test Whether The Variance In A Group Is Lower Than In Another
5
4
Entering edit mode
11.7 years ago

I have two groups of data (not distributed under a normal distribution): I would like to test the hypothesis that the first group has a lower (or narrower) standard deviation than the other.

An alternative explanation to this is that I would like to tell whether the first group is less 'variable', 'heterogeneous', than the first.

A kruskal-wallis won't do it because it compares the medians of two or more groups, and I am not interested in that.

A Levene or a Brown-Forsynth test compare the variance between the two groups and tell whether they have the same variance. This is better, but I would also like to tell if the variance in the first group is lower than in the other(s) group(s).

A simple Chi-Square test would tell me whether the standard deviation of a group is equal to a certain value, and the one-tailed version can tell me whether it is higher/lower.

An additional difficulty is that I would have to do this test as a two-way, because I have two grouping variables, but I would like to ask you if you can point me to any direction or give me some hint, I have not many ideas on where to search :-)

statistics r • 12k views
0
Entering edit mode

What is your non-normality assumption based on? Have you thought about transforming the data (with log transformation, for example) to be more normal?

0
Entering edit mode

You might also want to ask that question on stats.stackexchange.com. It's populated by lots of true-blooded statisticians who eat this stuff for breakfast.

0
Entering edit mode

Hi Giovanni, how did you end up solving this? I ran into a very similar problem.

8
Entering edit mode
11.7 years ago
Matt Parker ▴ 80

Is bootstrapping a possibility? Resample from your data, calculate the variance, repeat. This should leave you with a vector of bootstrapped variance estimates for each of your desired groups. Perform the appropriate test on those estimates (e.g., t-test if you're comparing two groups and the estimates turn out normally).

I think the boot package is the norm for resampling in R, but here's some untested code to clarify the idea:

n <- 1000
x <- rnorm(mean = 0, sd = 1, n = n)
y <- rnorm(mean = 0, sd = 1.1, n = n)

nboots <- 10000
bootvar.x <- vector(mode = "numeric", length = nboots)
bootvar.y <- vector(mode = "numeric", length = nboots)

for(i in seq_len(nboots)){
bootvar.x[i] <- var(sample(x, size = n, replace = TRUE))
bootvar.y[i] <- var(sample(y, size = n, replace = TRUE))
}

require(ggplot2)
#Probably a better way to do this
bootvar.x2 <- data.frame(var = bootvar.x, group = "x")
bootvar.y2 <- data.frame(var = bootvar.y, group = "y")
bootvars <- rbind(bootvar.x2, bootvar.y2)

ggplot(bootvars, aes(x = var, group = group, colour = group)) + geom_density()

t.test(bootvar.x, bootvar.y)


Disclaimer: I've read a bit about bootstrapping. Please don't assume I actually know anything. This is just a suggestion for something to check out.

3
Entering edit mode
11.2 years ago
hurfdurf ▴ 490

If this data is really non-normal, should you be using variance or standard deviation at all?

You might want to use more robust metrics like [?]median absolute deviation[?].

2
Entering edit mode
11.7 years ago

Look for the F-test or Bartlett's test. As your data is non-normal you need something more robust against deviation from normality. Leven's test is for example mentioned as an alternative

0
Entering edit mode

thanks, I forgot to say that I also looked at the Bartlett's test, but discarded it because it is sensitive to departures from normality and my data is not normal. Thanks anyway.

0
Entering edit mode

Then Forsythe test maybe? Look at the section: "Comparison with Levene's test"

2
Entering edit mode
11.2 years ago

Another alternative:

Transform the data by subtracting the mean (or median) from each data point and take the absolute values.

Now check the normality of each sample again and use a t-test or KS test as appropriate.

1
Entering edit mode
11.7 years ago

You can try a Friedman test at first for each factor (assuming they're independent) and, given that really there is some difference, proceed an adequate multiple hypothesis testing using Bonferroni method, for example. Not a sequential hypothesis testing like we usually do with microarray data. You'll need to specifiy all concurrent hypothesis (variance =, <, >) and significance/power levels.