One sample Wilcoxon test for each boxplot ggplot2
1
1
Entering edit mode
17 months ago

Hi folks! I need your help with a ggplot2 representation.

I have a data set that has the following structure:

log2FoldChange      Sequence_biotype      Knockdown     
-1.40               LTR                   A 
-1.11               DNA                   B
-3.46               Protein               A
-1.25               Protein               C
 1.03               DNA                   B
 ...                ...                   ...

I am plotting the foldChanges as boxplots, one for each Sequence_biotype. I have the knockdown variable faceted. What I'm trying to do is to do a one sample Wilcoxon test for each boxplot, comparing the log2FoldChange to 0 (to see if there is a significant change). That can be achieved with the following code:

wilcox.test(x = data, mu = 0)

when the data is grouped by Sequence_biotype and Knockdown.

My question is, how could I introduce the results of the wilcox test to the plot as labels or how could I compute it directly in the plot (I have been trying with stat_compare_means(method = "wilcox", paired = FALSE) from ggpubr package but all the pvalues are piling up to the same spot)

Thank you before hand, any help is appreciated!

Best,

Jordi

R ggplot2 • 1.8k views
ADD COMMENT
0
Entering edit mode

Hi,

Can you provide the code that you've tried as well the figure that you got?

António

ADD REPLY
0
Entering edit mode

Sure! This is the code and the plot I currently have

com = ggboxplot(data  = com_sig.df, x = "Sequence_biotype", y = "log2FoldChange", fill = "Sequence_biotype",
                 facet.by = "Knockdown") +
      geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
      theme_bw() +
      theme(legend.position = "none",
            axis.title.x = element_text(hjust = .975, vjust = -.6),
            plot.title = element_text(hjust = .5),
            axis.text.x = element_blank(),
            strip.background = element_blank(),
            strip.text.x = element_blank()) +
      stat_compare_means(method = "wilcox.test")
com

plot_generated

ADD REPLY
0
Entering edit mode

Hi! An expansion of y axis may help to avoid piling up stat_compare_means() results. Alternatively, results of Wilcoxon test may be added on the plot using ggpubr::stat_pvalue_manual(). In such cases, I compute Wilcoxon using ggpubr::compare_means() - maybe because it is directly compatible with ggpubr::stat_pvalue_manual(), I don't recall why)))

ADD REPLY
0
Entering edit mode

I have tried the compare_means approach and it works, I obtain the kind of data that I am looking for. However, I still don't know how to embed the data into the plot, I tried with stat_pvalue_manual() but I have the same problem as with stat_compare_means(), that the values are piled up on top of each other. Any clue why?

ADD REPLY
0
Entering edit mode

I guess that the problem arises because stat_compare_means compares all groups against all groups - there are too many values to plot. If you specify ref.group or comparisons, p-values will move to the respective bars. As far as I know, it will require an explicit plotting of compared groups on x-axis.

ADD REPLY
0
Entering edit mode

Yes indeed, if I specify a reference group all the labels move to the right place! The problem is that I don't want any reference group, but to compare it to mu=0... It is a wilcoxon one sample test what I would like to do

ADD REPLY
0
Entering edit mode

There should be one p-value for each facet, right?

ADD REPLY
0
Entering edit mode

One p-value for each biotype (or boxplot) in each facet. In other words, if I have 6 boxplots per facet, I should get 6 p-values per facet

ADD REPLY
1
Entering edit mode

If so, I can suggest making a table with p-values calculated for the corresponding Sequence_biotype and Knockdown type, then add arbitrary y-axis (log2FoldChange) value higher than its actual values to place p-values on top of the bars (say, 5) and finally add this table to the plot with geom_text(data = your_table, aes(label = p_values)). The presence of biotypes, facetting variable and y-axis values will take care of proper p-value positioning. Since so many p-values will overlap, you can 1) rotate p-values or a whole plot 2) incrementally increase y-axis value for each next p-value. For 6 Sequence_biotypes and 3 Knockdown types sorted in accordance with your plot, it will be simply like rep(seq(5, 7, length.out = 6), 3).

ADD REPLY
0
Entering edit mode

Thank you so much! This work smoothly! I am pasting the code with the answer and closing the question

ADD REPLY
5
Entering edit mode
16 months ago

In case anyone faces the same problem, I will paste the code I used to get the plot.

# Subset the dataset into different knockdowns
A = com_sig.df[com_sig.df$Knockdown=="A",]
B = com_sig.df[com_sig.df$Knockdown=="B",]
C = com_sig.df[com_sig.df$Knockdown=="C",]

# Compute the test for the first knockdown and the position in the plot
stat.test_A = compare_means(log2FoldChange ~ 1, paired = FALSE, data = A, method = "wilcox.test", 
                          group.by = "Sequence_biotype", mu = 0) %>%
                 mutate(y.position = 4.5)

# Add a label to identify the knockdown
stat.test_A$Knockdown = "A"

# Same for second knockdown
stat.test_B = compare_means(log2FoldChange ~ 1, paired = FALSE, data = B, method = "wilcox.test", 
                          group.by = "Sequence_biotype", mu = 0) %>%
                 mutate(y.position = 4.5)
stat.test_B$Knockdown = "B"

# Same for the third one
stat.test_C = compare_means(log2FoldChange ~ 1, paired = FALSE, data = C, method = "wilcox.test", 
                          group.by = "Sequence_biotype", mu = 0) %>%
                 mutate(y.position = 4.5)
stat.test_C$Knockdown = "C"

# Combine all 3 datasets into 1
stat.test = do.call(rbind, list(stat.test_A, stat.test_B, stat.test_C))

# change column name of the variable used to compute the wilcox test
names(stat.test)[names(stat.test)==".y."] = "log2FoldChange"

com = ggboxplot(data  = com_sig.df, x = "Sequence_biotype", y = "log2FoldChange", fill = "Sequence_biotype",
                 facet.by = "Knockdown") +
      geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
      theme_bw() +
      theme(legend.position = "none",
            axis.title.x = element_text(hjust = .975, vjust = -.6),
            plot.title = element_text(hjust = .5),
            axis.text.x = element_blank(),
            strip.background = element_blank(),
            strip.text.x = element_blank()) +
      geom_text(data = stat.test, aes(y=y.position,label = p.signif))
com

Plot Generated

ADD COMMENT

Login before adding your answer.

Traffic: 3018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6