Question: One sample Wilcoxon test for each boxplot ggplot2
0
gravatar for jordi.planells
5 weeks ago by
jordi.planells220 wrote:

Hi folks! I need your help with a ggplot2 representation.

I have a data set that has the following structure:

log2FoldChange      Sequence_biotype      Knockdown     
-1.40               LTR                   A 
-1.11               DNA                   B
-3.46               Protein               A
-1.25               Protein               C
 1.03               DNA                   B
 ...                ...                   ...

I am plotting the foldChanges as boxplots, one for each Sequence_biotype. I have the knockdown variable faceted. What I'm trying to do is to do a one sample Wilcoxon test for each boxplot, comparing the log2FoldChange to 0 (to see if there is a significant change). That can be achieved with the following code:

wilcox.test(x = data, mu = 0)

when the data is grouped by Sequence_biotype and Knockdown.

My question is, how could I introduce the results of the wilcox test to the plot as labels or how could I compute it directly in the plot (I have been trying with stat_compare_means(method = "wilcox", paired = FALSE) from ggpubr package but all the pvalues are piling up to the same spot)

Thank you before hand, any help is appreciated!

Best,

Jordi

ggplot2 R • 211 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by jordi.planells220

Hi,

Can you provide the code that you've tried as well the figure that you got?

António

ADD REPLYlink written 5 weeks ago by antonioggsousa910

Sure! This is the code and the plot I currently have

com = ggboxplot(data  = com_sig.df, x = "Sequence_biotype", y = "log2FoldChange", fill = "Sequence_biotype",
                 facet.by = "Knockdown") +
      geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
      theme_bw() +
      theme(legend.position = "none",
            axis.title.x = element_text(hjust = .975, vjust = -.6),
            plot.title = element_text(hjust = .5),
            axis.text.x = element_blank(),
            strip.background = element_blank(),
            strip.text.x = element_blank()) +
      stat_compare_means(method = "wilcox.test")
com

plot_generated

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by jordi.planells220

Hi! An expansion of y axis may help to avoid piling up stat_compare_means() results. Alternatively, results of Wilcoxon test may be added on the plot using ggpubr::stat_pvalue_manual(). In such cases, I compute Wilcoxon using ggpubr::compare_means() - maybe because it is directly compatible with ggpubr::stat_pvalue_manual(), I don't recall why)))

ADD REPLYlink written 5 weeks ago by Alex Nesmelov110

I have tried the compare_means approach and it works, I obtain the kind of data that I am looking for. However, I still don't know how to embed the data into the plot, I tried with stat_pvalue_manual() but I have the same problem as with stat_compare_means(), that the values are piled up on top of each other. Any clue why?

ADD REPLYlink written 5 weeks ago by jordi.planells220

I guess that the problem arises because stat_compare_means compares all groups against all groups - there are too many values to plot. If you specify ref.group or comparisons, p-values will move to the respective bars. As far as I know, it will require an explicit plotting of compared groups on x-axis.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Alex Nesmelov110

Yes indeed, if I specify a reference group all the labels move to the right place! The problem is that I don't want any reference group, but to compare it to mu=0... It is a wilcoxon one sample test what I would like to do

ADD REPLYlink written 5 weeks ago by jordi.planells220

There should be one p-value for each facet, right?

ADD REPLYlink written 5 weeks ago by Alex Nesmelov110

One p-value for each biotype (or boxplot) in each facet. In other words, if I have 6 boxplots per facet, I should get 6 p-values per facet

ADD REPLYlink written 5 weeks ago by jordi.planells220
1

If so, I can suggest making a table with p-values calculated for the corresponding Sequence_biotype and Knockdown type, then add arbitrary y-axis (log2FoldChange) value higher than its actual values to place p-values on top of the bars (say, 5) and finally add this table to the plot with geom_text(data = your_table, aes(label = p_values)). The presence of biotypes, facetting variable and y-axis values will take care of proper p-value positioning. Since so many p-values will overlap, you can 1) rotate p-values or a whole plot 2) incrementally increase y-axis value for each next p-value. For 6 Sequence_biotypes and 3 Knockdown types sorted in accordance with your plot, it will be simply like rep(seq(5, 7, length.out = 6), 3).

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Alex Nesmelov110

Thank you so much! This work smoothly! I am pasting the code with the answer and closing the question

ADD REPLYlink written 5 weeks ago by jordi.planells220
4
gravatar for jordi.planells
5 weeks ago by
jordi.planells220 wrote:

In case anyone faces the same problem, I will paste the code I used to get the plot.

# Subset the dataset into different knockdowns
A = com_sig.df[com_sig.df$Knockdown=="A",]
B = com_sig.df[com_sig.df$Knockdown=="B",]
C = com_sig.df[com_sig.df$Knockdown=="C",]

# Compute the test for the first knockdown and the position in the plot
stat.test_A = compare_means(log2FoldChange ~ 1, paired = FALSE, data = A, method = "wilcox.test", 
                          group.by = "Sequence_biotype", mu = 0) %>%
                 mutate(y.position = 4.5)

# Add a label to identify the knockdown
stat.test_A$Knockdown = "A"

# Same for second knockdown
stat.test_B = compare_means(log2FoldChange ~ 1, paired = FALSE, data = B, method = "wilcox.test", 
                          group.by = "Sequence_biotype", mu = 0) %>%
                 mutate(y.position = 4.5)
stat.test_B$Knockdown = "B"

# Same for the third one
stat.test_C = compare_means(log2FoldChange ~ 1, paired = FALSE, data = C, method = "wilcox.test", 
                          group.by = "Sequence_biotype", mu = 0) %>%
                 mutate(y.position = 4.5)
stat.test_C$Knockdown = "C"

# Combine all 3 datasets into 1
stat.test = do.call(rbind, list(stat.test_A, stat.test_B, stat.test_C))

# change column name of the variable used to compute the wilcox test
names(stat.test)[names(stat.test)==".y."] = "log2FoldChange"

com = ggboxplot(data  = com_sig.df, x = "Sequence_biotype", y = "log2FoldChange", fill = "Sequence_biotype",
                 facet.by = "Knockdown") +
      geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
      theme_bw() +
      theme(legend.position = "none",
            axis.title.x = element_text(hjust = .975, vjust = -.6),
            plot.title = element_text(hjust = .5),
            axis.text.x = element_blank(),
            strip.background = element_blank(),
            strip.text.x = element_blank()) +
      geom_text(data = stat.test, aes(y=y.position,label = p.signif))
com

Plot Generated

ADD COMMENTlink written 5 weeks ago by jordi.planells220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1413 users visited in the last hour