Question

One sample Wilcoxon test for each boxplot ggplot2

1

Entering edit mode

5.0 years ago

jordi.planells ▴ 480

Hi folks! I need your help with a ggplot2 representation.

I have a data set that has the following structure:

log2FoldChange      Sequence_biotype      Knockdown     
-1.40               LTR                   A 
-1.11               DNA                   B
-3.46               Protein               A
-1.25               Protein               C
 1.03               DNA                   B
 ...                ...                   ...

I am plotting the foldChanges as boxplots, one for each Sequence_biotype. I have the knockdown variable faceted. What I'm trying to do is to do a one sample Wilcoxon test for each boxplot, comparing the log2FoldChange to 0 (to see if there is a significant change). That can be achieved with the following code:

wilcox.test(x = data, mu = 0)

when the data is grouped by Sequence_biotype and Knockdown.

My question is, how could I introduce the results of the wilcox test to the plot as labels or how could I compute it directly in the plot (I have been trying with stat_compare_means(method = "wilcox", paired = FALSE) from ggpubr package but all the pvalues are piling up to the same spot)

Thank you before hand, any help is appreciated!

Best,

Jordi

R ggplot2 • 5.9k views

ADD COMMENT • link 5.0 years ago by jordi.planells ▴ 480

0

Entering edit mode

Hi,

Can you provide the code that you've tried as well the figure that you got?

António

ADD REPLY • link 5.0 years ago by antonioggsousa 3.4k

0

Entering edit mode

Sure! This is the code and the plot I currently have

com = ggboxplot(data  = com_sig.df, x = "Sequence_biotype", y = "log2FoldChange", fill = "Sequence_biotype",
                 facet.by = "Knockdown") +
      geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
      theme_bw() +
      theme(legend.position = "none",
            axis.title.x = element_text(hjust = .975, vjust = -.6),
            plot.title = element_text(hjust = .5),
            axis.text.x = element_blank(),
            strip.background = element_blank(),
            strip.text.x = element_blank()) +
      stat_compare_means(method = "wilcox.test")
com

plot_generated

ADD REPLY • link 5.0 years ago by jordi.planells ▴ 480

0

Entering edit mode

Hi! An expansion of y axis may help to avoid piling up stat_compare_means() results. Alternatively, results of Wilcoxon test may be added on the plot using ggpubr::stat_pvalue_manual(). In such cases, I compute Wilcoxon using ggpubr::compare_means() - maybe because it is directly compatible with ggpubr::stat_pvalue_manual(), I don't recall why)))

ADD REPLY • link 5.0 years ago by Alex Nesmelov ▴ 200

0

Entering edit mode

I have tried the compare_means approach and it works, I obtain the kind of data that I am looking for. However, I still don't know how to embed the data into the plot, I tried with stat_pvalue_manual() but I have the same problem as with stat_compare_means(), that the values are piled up on top of each other. Any clue why?

ADD REPLY • link 5.0 years ago by jordi.planells ▴ 480

0

Entering edit mode

I guess that the problem arises because stat_compare_means compares all groups against all groups - there are too many values to plot. If you specify ref.group or comparisons, p-values will move to the respective bars. As far as I know, it will require an explicit plotting of compared groups on x-axis.

ADD REPLY • link 5.0 years ago by Alex Nesmelov ▴ 200

0

Entering edit mode

Yes indeed, if I specify a reference group all the labels move to the right place! The problem is that I don't want any reference group, but to compare it to mu=0... It is a wilcoxon one sample test what I would like to do

ADD REPLY • link 5.0 years ago by jordi.planells ▴ 480

0

Entering edit mode

There should be one p-value for each facet, right?

ADD REPLY • link 5.0 years ago by Alex Nesmelov ▴ 200

0

Entering edit mode

One p-value for each biotype (or boxplot) in each facet. In other words, if I have 6 boxplots per facet, I should get 6 p-values per facet

ADD REPLY • link 5.0 years ago by jordi.planells ▴ 480

1

Entering edit mode

If so, I can suggest making a table with p-values calculated for the corresponding Sequence_biotype and Knockdown type, then add arbitrary y-axis (log2FoldChange) value higher than its actual values to place p-values on top of the bars (say, 5) and finally add this table to the plot with geom_text(data = your_table, aes(label = p_values)). The presence of biotypes, facetting variable and y-axis values will take care of proper p-value positioning. Since so many p-values will overlap, you can 1) rotate p-values or a whole plot 2) incrementally increase y-axis value for each next p-value. For 6 Sequence_biotypes and 3 Knockdown types sorted in accordance with your plot, it will be simply like rep(seq(5, 7, length.out = 6), 3).

ADD REPLY • link 5.0 years ago by Alex Nesmelov ▴ 200

0

Entering edit mode

Thank you so much! This work smoothly! I am pasting the code with the answer and closing the question

ADD REPLY • link 5.0 years ago by jordi.planells ▴ 480

score 5 · Accepted Answer · 2020-07-09

In case anyone faces the same problem, I will paste the code I used to get the plot.

# Subset the dataset into different knockdowns
A = com_sig.df[com_sig.df$Knockdown=="A",]
B = com_sig.df[com_sig.df$Knockdown=="B",]
C = com_sig.df[com_sig.df$Knockdown=="C",]

# Compute the test for the first knockdown and the position in the plot
stat.test_A = compare_means(log2FoldChange ~ 1, paired = FALSE, data = A, method = "wilcox.test", 
                          group.by = "Sequence_biotype", mu = 0) %>%
                 mutate(y.position = 4.5)

# Add a label to identify the knockdown
stat.test_A$Knockdown = "A"

# Same for second knockdown
stat.test_B = compare_means(log2FoldChange ~ 1, paired = FALSE, data = B, method = "wilcox.test", 
                          group.by = "Sequence_biotype", mu = 0) %>%
                 mutate(y.position = 4.5)
stat.test_B$Knockdown = "B"

# Same for the third one
stat.test_C = compare_means(log2FoldChange ~ 1, paired = FALSE, data = C, method = "wilcox.test", 
                          group.by = "Sequence_biotype", mu = 0) %>%
                 mutate(y.position = 4.5)
stat.test_C$Knockdown = "C"

# Combine all 3 datasets into 1
stat.test = do.call(rbind, list(stat.test_A, stat.test_B, stat.test_C))

# change column name of the variable used to compute the wilcox test
names(stat.test)[names(stat.test)==".y."] = "log2FoldChange"

com = ggboxplot(data  = com_sig.df, x = "Sequence_biotype", y = "log2FoldChange", fill = "Sequence_biotype",
                 facet.by = "Knockdown") +
      geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
      theme_bw() +
      theme(legend.position = "none",
            axis.title.x = element_text(hjust = .975, vjust = -.6),
            plot.title = element_text(hjust = .5),
            axis.text.x = element_blank(),
            strip.background = element_blank(),
            strip.text.x = element_blank()) +
      geom_text(data = stat.test, aes(y=y.position,label = p.signif))
com

Plot Generated