Question

Kruskal Wallis test, gene expression, RNA-seq

0

Entering edit mode

2.0 years ago

Rob ▴ 170

Hi friends,

Does anyone have an R code for the Kruskal-Wallis test for gene expression data?

60 columns and 230 rows

Thanks

RNA-seq kruskal-wallis • 1.8k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 2.0 years ago by Rob ▴ 170

score 3 · Accepted Answer · 2022-05-01

3

Entering edit mode

2.0 years ago

Kevin Blighe 87k

Ah, the Kruskal-Wallis --the non-parametric ANOVA-- my favourite.

Here is some code for you:

In short, the function is kruskal.test()

As you will see, you will require some metadata, the rows of which will have to be aligned to your expression data columns, which you will have to transpose (the expression data needs to be transposed), such that genes are columns and samples are rows. This metadata should contain one or more categorical variables that each contain two or more groups (like disease|nomal, group1|group2|group3, et cetera), the expression of a given gene across which you will compute the Kruskal-Wallis test.

Kevin

ADD COMMENT • link 2.0 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you Kevin The code you shared is for one column (Ozone). My problem is I don't know how to customize it for multiple columns. Is there any way?

ADD REPLY • link 2.0 years ago by Rob ▴ 170

4

Entering edit mode

There are a few different ways to do that. Here is a reproducible example that eventually outputs a named vector of p-values, which I trust that you can adapt to your own data:

mydata <- data.frame(
  group = c(rep(c('A','B','C'), 6)),
  gene1 = runif(n = 18, min = -10, max = 10),
  gene2 = runif(n = 18, min = -10, max = 10),
  gene3 = runif(n = 18, min = -10, max = 10))
mydata$group <- factor(mydata$group, levels = c('A','B','C'))
mydata
   group      gene1      gene2       gene3
1      A -6.5828614  7.2061653 -4.78849342
2      B  8.0654531  6.8851541 -6.66095607
3      C  0.9716438  9.7684197  5.91039503
4      A  0.2177755 -9.5862999 -9.21830310
5      B  7.1282572 -6.5170962  0.72348540
6      C  4.6401339  5.1515160 -9.60944901
7      A  2.3342942 -3.3071833 -2.87845721
8      B  1.8221881  4.3000686  1.13610765
9      C  3.6113445 -2.5834670 -9.50316515
10     A  8.0464613 -1.1339911  3.40090883
11     B  8.6698397 -5.4779950 -6.58398749
12     C  7.8709801 -5.7729320 -1.81991194
13     A  6.7190791 -4.4822967 -8.43157336
14     B  5.0844176 -5.3323176 -0.08187998
15     C -7.9576522  1.5789440 -0.64270702
16     A  3.6994011 -3.8620588  9.41000514
17     B  4.7855473 -7.2345133 -8.80236299
18     C  7.5792729  0.5457233  3.74570574

# run a test
  kruskal.test(gene1 ~ group, data = mydata)
  kruskal.test(gene1 ~ group, data = mydata)$p.value
  [1] 0.2778423

genes <- colnames(mydata)[-1]
genes
[1] "gene1" "gene2" "gene3"

res <- sapply(genes, function(x) {
  f <- as.formula(paste0(x, ' ~ group'))
  model <- kruskal.test(f, data = mydata)
  p <- model$p.value
  p
})
names(res) <- genes
res
    gene1     gene2     gene3 
0.2778423 0.3469830 0.9941691

We're unlucky this time, as, in this random data generated on this glorious Sunday of May 1st 2022, nothing was statistically significant.

ADD REPLY • link 2.0 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks Kevin, I didnot understand, do we have one data and one metadata files? or just one file?

So, I tried to run with my data file and I got this error :

code chunk:

res <- sapply(genes, function(x) {
+   f <- as.formula(paste0(x, ' ~ group'))
+   model <- kruskal.test(f, data = data)
+   p <- model$p.value
+   p
+ })

Error in eval(predvars, data, env) : object 'ERVV' not found

ADD REPLY • link 2.0 years ago by Rob ▴ 170

0

Entering edit mode

You need to figure this out on your own - sorry. Somebody else may additionally help.

ADD REPLY • link 2.0 years ago by Kevin Blighe 87k