Question: Test Whether The Variance In A Group Is Lower Than In Another
4
Giovanni M Dall'Olio27k wrote:

I have two groups of data (not distributed under a normal distribution): I would like to test the hypothesis that the first group has a lower (or narrower) standard deviation than the other.

An alternative explanation to this is that I would like to tell whether the first group is less 'variable', 'heterogeneous', than the first.

A kruskal-wallis won't do it because it compares the medians of two or more groups, and I am not interested in that.

A Levene or a Brown-Forsynth test compare the variance between the two groups and tell whether they have the same variance. This is better, but I would also like to tell if the variance in the first group is lower than in the other(s) group(s).

A simple Chi-Square test would tell me whether the standard deviation of a group is equal to a certain value, and the one-tailed version can tell me whether it is higher/lower.

An additional difficulty is that I would have to do this test as a two-way, because I have two grouping variables, but I would like to ask you if you can point me to any direction or give me some hint, I have not many ideas on where to search :-)

R statistics • 11k views
modified 10.3 years ago by Alastair Kerr5.2k • written 10.3 years ago by Giovanni M Dall'Olio27k

What is your non-normality assumption based on? Have you thought about transforming the data (with log transformation, for example) to be more normal?

You might also want to ask that question on stats.stackexchange.com. It's populated by lots of true-blooded statisticians who eat this stuff for breakfast.

Hi Giovanni, how did you end up solving this? I ran into a very similar problem.

8
Matt Parker80 wrote:

Is bootstrapping a possibility? Resample from your data, calculate the variance, repeat. This should leave you with a vector of bootstrapped variance estimates for each of your desired groups. Perform the appropriate test on those estimates (e.g., t-test if you're comparing two groups and the estimates turn out normally).

I think the `boot` package is the norm for resampling in R, but here's some untested code to clarify the idea:

``````n <- 1000
x <- rnorm(mean = 0, sd = 1, n = n)
y <- rnorm(mean = 0, sd = 1.1, n = n)

nboots <- 10000
bootvar.x <- vector(mode = "numeric", length = nboots)
bootvar.y <- vector(mode = "numeric", length = nboots)

for(i in seq_len(nboots)){
bootvar.x[i] <- var(sample(x, size = n, replace = TRUE))
bootvar.y[i] <- var(sample(y, size = n, replace = TRUE))
}

require(ggplot2)
#Probably a better way to do this
bootvar.x2 <- data.frame(var = bootvar.x, group = "x")
bootvar.y2 <- data.frame(var = bootvar.y, group = "y")
bootvars <- rbind(bootvar.x2, bootvar.y2)

ggplot(bootvars, aes(x = var, group = group, colour = group)) + geom_density()

t.test(bootvar.x, bootvar.y)
``````

Disclaimer: I've read a bit about bootstrapping. Please don't assume I actually know anything. This is just a suggestion for something to check out.

3
hurfdurf460 wrote:

If this data is really non-normal, should you be using variance or standard deviation at all?

You might want to use more robust metrics like [?]median absolute deviation[?].

2
Michael Dondrup47k wrote:

Look for the F-test or Bartlett's test. As your data is non-normal you need something more robust against deviation from normality. Leven's test is for example mentioned as an alternative

thanks, I forgot to say that I also looked at the Bartlett's test, but discarded it because it is sensitive to departures from normality and my data is not normal. Thanks anyway.

Then Forsythe test maybe? Look at the section: "Comparison with Levene's test"

2
Alastair Kerr5.2k wrote:

Another alternative:

Transform the data by subtracting the mean (or median) from each data point and take the absolute values.

Now check the normality of each sample again and use a t-test or KS test as appropriate.

1
Jarretinha3.3k wrote:

You can try a Friedman test at first for each factor (assuming they're independent) and, given that really there is some difference, proceed an adequate multiple hypothesis testing using Bonferroni method, for example. Not a sequential hypothesis testing like we usually do with microarray data. You'll need to specifiy all concurrent hypothesis (variance =, <, >) and significance/power levels.