How can I adjust Y-axis scale when making relative abundance box plot ?
Entering edit mode
3 months ago
ohtang7 ▴ 40

I am creating a relative abundance boxplot comparing two groups (pet, stray) using eight genera. However, the resulting plot displays not pretty in box shape. The reason is too wide variance of the Y-axis data.

I assume that 1) eliminate Y-axis outlier and 2) Use log transformation with the relative abundance data for scaling would be good solution of this.

What if I have many 0 values when I do trnasformation to log10 ?

Is there any well-used library that for automatically transformation for purpose of this kind of work in R ?

original result : enter image description here

My original R code is here :

library(ggplot2) data <- read.csv("relative abundance raw data (putative pathogen).csv") p<-ggplot(data, aes(x="Genus", y="Relative_abundance", fill="Group")) + geom_boxplot(position = position_dodge(width = 0.8), alpha = 0.8) + labs(title = "Relative Abundance Comparison", x = "Genus", y = "Relative Abundance", fill = "Group") + theme_minimal() + scale_fill_manual(values = c("stray" = "blue", "pet" = "red")) p + geom_jitter(shape=16, position=position_jitter(0.2))

My raw data file can be downloaded here :

Please help me for making pretty box plot by adjusting y-scale !!

statistics box-plot R scale_adjustment logarithm • 493 views
Entering edit mode

You can try scale_y_sqrt() instead if you don't like the look of the log10 transformation. As an aside, it doesn't look like your fill variable is working as everything is the same grey...

Finally, I don't think you have enough data to really show that the "outliers" are actually outliers in need of being removed. Yes, they are far outside of the distribution otherwise, but you only have ~50 data points in that genus. Depending on what stats you use, you could check the Residuals vs Leverage diagnostic plots in R to see if they have more support for being removed.

Entering edit mode

Please provide data as dput(), not via any random dropbox, that could be anything (also malware, theoretically). If log transformation creates zeros then one typically adds a pseudocounts, like 1 or 0.1 before transformation.

Entering edit mode

I agree with ATpoint - you can replace 0s with .1.

For determining skewness, I like to use the skewness() function in the moments package in R. Since your data appears to be right-skewed, you're right that a log10 transformation might give you a more normal distribution. As dthorbur mentioned, there are other transformations that you could do. Less extreme transformations would include square root, cube root, log2, and natural log. A more extreme transformation would be to take the inverse, although I don't think it would make sense to use that with fold change data. log2 fold change is commonly used in biomedical research.


Login before adding your answer.

Traffic: 2767 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6