Question: R- Kruskal-Wallis test on multiple columns at once
gravatar for mafernandez
5 months ago by
Madrid, Spain
mafernandez0 wrote:

This maybe sounds a bit simple, but I cannot get the answer.

I have a dataset in R that has 26 samples in rows and many variables (>20) in columns. Some of them are categorical, so what I need to do is to carry out a Kruskal Wallis test for each numerical variable depending on each categorical one, so I do:

env_fact <- read.csv("environ_facts.csv")

kruskal.test(env_fact-1 ~ Categorical_var-1,  data=env_fact)

But with this I can only do the test to the numerical variables one by one, which is tiresome.

Is there any way to carry all the Kruskal-Wallis tests for all numerical variables at once? I can repeat it by each categorical variable, since I only have 4, but for the numerical one I have more than 20!!

Thanks a lot

ADD COMMENTlink modified 5 months ago • written 5 months ago by mafernandez0

I don't see bioinformatic relevance, but might this help?

A Tutorial on Loops in R - Usage and Alternatives

Correct for multiple testing after this

ADD REPLYlink modified 5 months ago • written 5 months ago by Carambakaracho1.9k

Hi, just as a side note, your problem might be "under-defined" to estimate variable importance because you have approximately the same number of observations and variables, 26 samples and ">20 variables". I am not sure if your statistics will be robust enough to draw reliable conclusions.

ADD REPLYlink written 5 months ago by Michael Dondrup47k

Thanks Michael

If you try to get all your variables at once the result may be not robust enough. However, what I want to do is to test each variable separately, so we will have one variable and 26 observations, classified by other different parameters into groups (3 or more groups, thus Kruskal-Wallis is needed instead of Mann-Whitney).

My problem is to find a command that allows me to do this for each variable at once, not to write the same code for every one (i.e. writing the same code 20 times).


ADD REPLYlink written 5 months ago by mafernandez0

As already suggested this seems like it can be solved with a simple for loop or a vectorized equivalent. If not then you'll have to explain why.

ADD REPLYlink written 5 months ago by Jean-Karim Heriche21k

As you can presume, I am very new at programming, so I do not know how to creat a for loop with my data.

Let's say, I have variables (both numerical and categorical) in columns and samples in rows. Then I have to test each numeric variable based on each categorical one with the samples as the observations.

Then, I should try something like:

env_fact <- read.csv("environ_facts.csv")

for i in env_fact
kruskal.test(i ~ Categorical_var-1,  data=env_fact)


ADD REPLYlink written 5 months ago by mafernandez0

Almost. So this is an R programming question. Have a look at the tutorial linked above and/or any tutorial on R programming to get started. You need to know how to access variables in an R data frame. Look at something like this (not using the formula version of the function and assuming variables have names in column headers):

my.variables <- colnames(env_fact)
for(i in 1:length(my.variables)) {
    if(my.variables[i] == 'Categorical_var') {
    } else {
        kruskal.test(env_fact[,i], env_fact$Categorical_var)
ADD REPLYlink written 5 months ago by Jean-Karim Heriche21k

Sounds Great! However, I still cannot see all the Kruskal-Wallis tests.

If I try something similar to:

KW_test<-kruskal.test(env_fact[,i], env_fact$Categorical_var)

I only get a p-value, that I suppose, correspond to the test carried out as a whole and not variable per variable...

How could I get all the tests (i.e. each variable separately) on screen?

Thank you very much, your help is really carrying my analysis off!

ADD REPLYlink written 5 months ago by mafernandez0

I think you need to go and read about how to program in R to understand and reuse pieces of code. At this stage, I think it would be a disservice to code it entirely for you.

ADD REPLYlink written 5 months ago by Jean-Karim Heriche21k

Thank you very much Jean-Karim.

I will try to figure out how to finish the analysis in the way I need. As you see, I am very new to R, so within your code I could understand almost everything what you've written, but I won't be able to write it not even in a million years ;-)


ADD REPLYlink written 5 months ago by mafernandez0

Thanks, @carambakaracho

Yes, I did not point out the bioinformatic point of view of the topic.

I need this analysis to be carried out on a microbial ecology study to figure out what environmental factors have any influence (or not) on the community composition. I thought my doubt could be shared by anyone else.

Nonetheless, I am going to try what you suggested and post the result afterwards.


ADD REPLYlink written 5 months ago by mafernandez0

Use the 'add reply' button to reply to a comment.

ADD REPLYlink written 5 months ago by Jean-Karim Heriche21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1791 users visited in the last hour