Test Whether The Variance In A Group Is Lower Than In Another
5
4
Entering edit mode
11.7 years ago

I have two groups of data (not distributed under a normal distribution): I would like to test the hypothesis that the first group has a lower (or narrower) standard deviation than the other.

An alternative explanation to this is that I would like to tell whether the first group is less 'variable', 'heterogeneous', than the first.

A kruskal-wallis won't do it because it compares the medians of two or more groups, and I am not interested in that.

A Levene or a Brown-Forsynth test compare the variance between the two groups and tell whether they have the same variance. This is better, but I would also like to tell if the variance in the first group is lower than in the other(s) group(s).

A simple Chi-Square test would tell me whether the standard deviation of a group is equal to a certain value, and the one-tailed version can tell me whether it is higher/lower.

An additional difficulty is that I would have to do this test as a two-way, because I have two grouping variables, but I would like to ask you if you can point me to any direction or give me some hint, I have not many ideas on where to search :-)

statistics r • 12k views
ADD COMMENT
0
Entering edit mode

What is your non-normality assumption based on? Have you thought about transforming the data (with log transformation, for example) to be more normal?

ADD REPLY
0
Entering edit mode

You might also want to ask that question on stats.stackexchange.com. It's populated by lots of true-blooded statisticians who eat this stuff for breakfast.

ADD REPLY
0
Entering edit mode

Hi Giovanni, how did you end up solving this? I ran into a very similar problem.

ADD REPLY
8
Entering edit mode
11.7 years ago
Matt Parker ▴ 80

Is bootstrapping a possibility? Resample from your data, calculate the variance, repeat. This should leave you with a vector of bootstrapped variance estimates for each of your desired groups. Perform the appropriate test on those estimates (e.g., t-test if you're comparing two groups and the estimates turn out normally).

I think the boot package is the norm for resampling in R, but here's some untested code to clarify the idea:

n <- 1000
x <- rnorm(mean = 0, sd = 1, n = n)
y <- rnorm(mean = 0, sd = 1.1, n = n)

nboots <- 10000
bootvar.x <- vector(mode = "numeric", length = nboots)
bootvar.y <- vector(mode = "numeric", length = nboots)

for(i in seq_len(nboots)){
  bootvar.x[i] <- var(sample(x, size = n, replace = TRUE))
  bootvar.y[i] <- var(sample(y, size = n, replace = TRUE))
}

require(ggplot2)
#Probably a better way to do this
bootvar.x2 <- data.frame(var = bootvar.x, group = "x")
bootvar.y2 <- data.frame(var = bootvar.y, group = "y")
bootvars <- rbind(bootvar.x2, bootvar.y2)

ggplot(bootvars, aes(x = var, group = group, colour = group)) + geom_density()

t.test(bootvar.x, bootvar.y)

Disclaimer: I've read a bit about bootstrapping. Please don't assume I actually know anything. This is just a suggestion for something to check out.

ADD COMMENT
3
Entering edit mode
11.2 years ago
hurfdurf ▴ 490

If this data is really non-normal, should you be using variance or standard deviation at all?

You might want to use more robust metrics like [?]median absolute deviation[?].

ADD COMMENT
2
Entering edit mode
11.7 years ago

Look for the F-test or Bartlett's test. As your data is non-normal you need something more robust against deviation from normality. Leven's test is for example mentioned as an alternative

ADD COMMENT
0
Entering edit mode

thanks, I forgot to say that I also looked at the Bartlett's test, but discarded it because it is sensitive to departures from normality and my data is not normal. Thanks anyway.

ADD REPLY
0
Entering edit mode

Then Forsythe test maybe? Look at the section: "Comparison with Levene's test"

ADD REPLY
2
Entering edit mode
11.2 years ago

Another alternative:

Transform the data by subtracting the mean (or median) from each data point and take the absolute values.

Now check the normality of each sample again and use a t-test or KS test as appropriate.

ADD COMMENT
1
Entering edit mode
11.7 years ago

You can try a Friedman test at first for each factor (assuming they're independent) and, given that really there is some difference, proceed an adequate multiple hypothesis testing using Bonferroni method, for example. Not a sequential hypothesis testing like we usually do with microarray data. You'll need to specifiy all concurrent hypothesis (variance =, <, >) and significance/power levels.

I don't know much about your experimental/test design. You could furnish additional detais.

ADD COMMENT

Login before adding your answer.

Traffic: 2597 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6