Fastest way of doing separate two way ANOVAs when you have a long list of dependent variables
2
2
Entering edit mode
9.7 years ago
p.apnts12 ▴ 20

Would it be possible to do a two-way ANOVA in an expedited way given that I have two factors (age and diet) but many dependent variables (184 different metabolites measured) which are not necessarily correlated? Or would I have to do the 2-way ANOVAs multiple times? MANOVA doesn't seem like the correct statistical test because the dependent variables may not be correlated. I have been using Prism and don't know if R would be able to do this or how to arrange the data.

R • 6.7k views
ADD COMMENT
0
Entering edit mode

Thank you very much for your answer.

The data are arranged like this for the first metabolite (Ala):

        AL          CR
Young   164.955     247.136
        400.926     219.419
        223.974     95.28
        189.466     287.823
        300.871     221.316
        247.491     213.881
Old     122.158     244.36
        283.073     211.065
        319.116     277.264
        301.584     218.229
        292.163     397.107
        159.531     461.626
        322.171     199.855

I am not sure if I would have to arrange the data like this for the remaining (Arg, Asn, Asp, Cit, ..until the 184). That would be nice if R could do the two way anova, but I am not really sure how.

ADD REPLY
0
Entering edit mode

Thank you for your suggestions.

ADD REPLY
3
Entering edit mode
9.7 years ago

Surely it is possible but without more detail of your question and about input/output format it is difficult to give a definitive answer. This is a very crude way to address your problem. If your dependent variables are in a list you could loop thorough that list:

Sample data:

age<- 1:10
diet<- factor(c('A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'))
dependent_vars<- list(
    y1= rnorm(n= 10),
    y2= rnorm(n= 10),
    y3= rnorm(n= 10)
    ## Etc... till 183
)

## Loop through list:
for(y in names(dependent_vars)){
    ymod<- summary(aov(dependent_vars[[y]] ~ age * diet))
    cat(paste('\nDependent var:', y, '\n'))
    print(ymod)
}

Output:

Dependent var: y1
            Df Sum Sq Mean Sq F value Pr(>F)
age          1  0.858   0.858   0.456  0.525
diet         1  4.310   4.310   2.291  0.181
age:diet     1  0.002   0.002   0.001  0.978
Residuals    6 11.289   1.881               

Dependent var: y2
            Df Sum Sq Mean Sq F value Pr(>F)
age          1  0.209  0.2088   0.278  0.617
diet         1  0.449  0.4486   0.598  0.469
age:diet     1  0.015  0.0145   0.019  0.894
Residuals    6  4.500  0.7501               

Dependent var: y3
            Df Sum Sq Mean Sq F value Pr(>F)
age          1  0.005  0.0046   0.003  0.956
diet         1  0.442  0.4424   0.319  0.593
age:diet     1  0.258  0.2576   0.186  0.681
Residuals    6  8.315  1.3859

The output of the anova goes to standard output, so it might not be very useful. Most important: Make sure it does what you need!

ADD COMMENT
0
Entering edit mode

Please don't use a for loop like that in R. apply() is your friend and is vastly faster.

ADD REPLY
0
Entering edit mode

Indeed. But I think in this case most of the time is spent running aov() rather than looping so it shouldn't really matter. I'd choose the for loop as I find it more readable (?)

ADD REPLY
0
Entering edit mode
9.7 years ago
pld 5.1k

If you have access to a cluster or a machine with multiple processors/cores, check out the R package snow. I think it requires some flavor of MPI be installed, but it is fairly trivial to use. You can basically swap out calls to apply with calls to the snow function parApply.

Instead of running through each ANOVA call in serial, you'll be able to do n at a time in parallel, where n is the number of processors/cores you have.

ADD COMMENT

Login before adding your answer.

Traffic: 2618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6