Question: Calculate Pairwise Wilcox.Test For Several Categories And Plot Significance Into A Boxplot With Ggplot2
Biojl1.6k wrote:

I have an R dataset that looks pretty much like this one from diamonds:

``````diamonds2 = subset(diamonds, cut!='Good' & cut!='Very Good', -c(table, x, y, z, clarity, depth, price))
``````

I want to make a boxplot like this one:

``````ggplot(diamonds2, aes(x=color, y=carat, col=cut))+geom_boxplot()
``````

And the hard question comes here. My idea is to perform pairwise wilcox.test for each distribution of the variable y (carat) by group (cut) and for each of the columns (color).

``````pairwise.wilcox.test(diamonds2[,'carat'], interaction(diamonds2[,'cut'],diamonds2[,'color']), p.adj = "bonf"
``````

It's not very elegant because is creating a matrix with extra comparisons, but that's the best I got so far. I would like to prune it.

Additionally I would like to plot the results as asterisks of the color between the two distributions I'm comparing. In the first boxplot (D), I would like to plot 3 asterisks, a purple (red and blue are significantly different), a yellow and a cian.

About the asterisk color plotting I've been playing a bit with the function geom_text from ggplot2 but I can't figure out how to plot below the X axis or plot text in different colors.

UPDATE The real data is very similar to the one I posted. The real data are frequencies for all aminoacids in 3 different set of genes. I can plot asterisks/stars with the geom_text in a particular position but can't automatize it to plot significance taking the information from the table I generated and also can't plot in the X axis, above the letter of the aminoacid.

I did the first columns of the significance stars with Gimp, this is how it should look like. R statistics plot • 5.4k views
Kevin Blighe43k wrote: The Wilcoxon Signed Rank test itself is easy:

``````wilcox.test(..., paired=TRUE, ...)
``````

Kevin

I have a doubt i use this ggpubr library for these doing test as well as to plot ,for this im taking rlog values but when i do wlcoxon or KW test the Y axis is not as my expression values what i get is its rank based ,when i just plot normal boxplot i get like range form 0-15 when i use this ggpubr i get range from 0-50 .why is it so?

Hey Krushnach. I am not familiar with ggpubr. Is it doing some transformation / scaling on the data?

The normal boxplot [it seems that the links are broken if i add the image] https://imgur.com/a/FVUQpcu

The boxplot with stats using ggpubr https://imgur.com/a/bNfNCzK

I guess its doing some transformation which im not sure scaling may not be

Oh right, but the data-points do not actually differ (?). It looks like ggpubr has just added a whole lot of extra padding at the top. You can probably change the y-axis limits via ggpubr?