Question

Multiple comparisons in proteomic data

0

Entering edit mode

4.0 years ago

Peter ▴ 20

Hello. I have a question about the application of the dunn.test function. I'm new to R and sorry if this question is basic. I have cancer proteomics data, I have four groups: CTRL (n = 8), GBM1 (n = 6), GBM2 (n = 6), GBM3 (n = 6).

I need to know which proteins are regulated differently between groups. After reading it, I found out that I have to apply a non-parametric test and do a post hoc analysis to find out the comparisons in pairs.

Well, I started working with my data as follows:

GROUP     P04899     P04406    Q93077
CTRL
CTRL
GBM1
GBM2
GBM3

I have over 2000 proteins

I started from the basics and saw that I have to apply this command:

A1 <- dunnTest (GBMTable$P04899 ~GBMTable$ Group, method = "bh")

Dunn (1964) Kruskal-Wallis multiple comparison
  p-values adjusted with the Benjamini-Hochberg method.

  Comparison          Z     P.unadj      P.adj
1      CTRL - GBM1 -1.1322770 0.257517982 0.38627697
2      CTRL - GBM2 -2.1513264 0.031450449 0.09435135
3      GBM1 - GBM2 -1.0190493 0.308179547 0.36981546
4      CTRL -GBM3  0.7925939 0.428014450 0.42801445
5      GBM1 - GBM3  1.9248710 0.054245504 0.10849101
6      GBM2 - GBM3  2.9439203 0.003240835 0.01944501

But I wanted to do this for all proteins automatically, possibly using a "for" command, I believe. And I would like to save all the results in a file to open in excel.

Does anyone have a tip?

Thanks!

R • 713 views

ADD COMMENT • link updated 4.0 years ago by Kevin Blighe 87k • written 4.0 years ago by Peter ▴ 20

score 2 · Answer 1 · 2020-04-25

Hi, nice to see somebody else who uses Dunn's test. I happened to have some data open in my terminal.

There are different ways to do this, via a standard for() loop, apply(), lapply(), or parallelised functions like foreach(), mclapply(), or parLapply().

dunnTest() does not have a large footprint or computational cost, so, we can just use lapply():

1, the data:

modeling[1:5,1:5]
  DiseaseStatus     gene1      gene2      gene3
  Disease_3         -8.484839  -2.2394204 -0.6758937
  Disease_2         -8.304133  -1.9972881 -0.6571702
  Disease_2         -7.654713  -1.1625870 -0.3728583
  Disease_2         -6.077792   0.5142025  0.2103687
  Disease_3         -8.232873  -2.3506886 -0.6251163
  gene4
  -1.7088455
  -1.6124870
  -1.7861746
  -0.1267115
  -1.3069726

dim(modeling)
[1] 81 94

2, define variables to test

genes <- colnames(modeling)[2:ncol(modeling)]
head(genes)
[1] "gene1" "gene2" "gene3" "gene4" "gene5" "gene6"

3, perform Dunn's test and store results in a list

require(FSA)
Carregando pacotes exigidos: FSA
## FSA v0.8.30. See citation('FSA') if used in publication.
## Run fishR() for related website and fishR('IFAR') for related book.
Attaching package: ‘FSA’

result <- lapply(genes,
  function(x) {
    dunnTest(as.formula(paste0(x, '~ DiseaseStatus')),
      data = modeling, method = 'bh')$res
  })

# name the list items by the variables tested
names(result) <- genes
result[[1]]

4, bind the list objects together into a 'master' results table, and tidy rownames

# old way via do.call()
result_final <- do.call(rbind, result)
result_final <- data.frame(gene = gsub('\\.[0-9]*$', '', rownames(result_final)), result_final)

# 'modern' way would be via *data.table::rbindlist*

head(result_final)
 gene  Comparison             Z          P.unadj   P.adj
 gene1 Disease_1 - Disease_2 -0.02897449 0.9768849 0.9768849
 gene1 Disease_1 - Disease_3  0.49975886 0.6172449 0.7122056
 gene1 Disease_2 - Disease_3  0.63644611 0.5244857 0.7867285
 gene1 Disease_1 - Healthy_1  0.91243067 0.3615421 0.6025701
 gene1 Disease_2 - Healthy_1  1.09102134 0.2752635 0.6881587
 gene1 Disease_3 - Healthy_1  0.49975886 0.6172449 0.7715561

tail(result_final)
 gene   Comparison             Z          P.unadj   P.adj
 gene93 Healthy_1 - Healthy_2 -0.05961193 0.9524647 0.9524647
 gene93 Disease_1 - Healthy_3 -1.34851137 0.1774940 0.8874698
 gene93 Disease_2 - Healthy_3  0.61426075 0.5390430 0.6738038
 gene93 Disease_3 - Healthy_3  1.14281522 0.2531153 0.4745912
 gene93 Healthy_1 - Healthy_3 -0.15156180 0.8795326 1.0000000
 gene93 Healthy_2 - Healthy_3 -0.10310063 0.9178831 0.9834462

5, write out to TSV

write.table(result_final,
  'DunnsTestresults.tsv',
  row.names = FALSE, sep = '\t', quote = FALSE)

If there are NA values in your data, it will cause an error somewhere. So, sort those out before running dunnTest().

Kevin