I have an R dataset that looks pretty much like this one from diamonds:
diamonds2 = subset(diamonds, cut!='Good' & cut!='Very Good', -c(table, x, y, z, clarity, depth, price))
I want to make a boxplot like this one:
ggplot(diamonds2, aes(x=color, y=carat, col=cut))+geom_boxplot()
And the hard question comes here. My idea is to perform pairwise wilcox.test for each distribution of the variable y (carat) by group (cut) and for each of the columns (color).
pairwise.wilcox.test(diamonds2[,'carat'], interaction(diamonds2[,'cut'],diamonds2[,'color']), p.adj = "bonf"
It's not very elegant because is creating a matrix with extra comparisons, but that's the best I got so far. I would like to prune it.
Additionally I would like to plot the results as asterisks of the color between the two distributions I'm comparing. In the first boxplot (D), I would like to plot 3 asterisks, a purple (red and blue are significantly different), a yellow and a cian.
About the asterisk color plotting I've been playing a bit with the function geom_text from ggplot2 but I can't figure out how to plot below the X axis or plot text in different colors.
UPDATE The real data is very similar to the one I posted. The real data are frequencies for all aminoacids in 3 different set of genes. I can plot asterisks/stars with the geom_text in a particular position but can't automatize it to plot significance taking the information from the table I generated and also can't plot in the X axis, above the letter of the aminoacid.
I did the first columns of the significance stars with Gimp, this is how it should look like.