Question

How can I adjust Y-axis scale when making relative abundance box plot ?

1

Entering edit mode

11 months ago

ohtang7 ▴ 40

I am creating a relative abundance boxplot comparing two groups (pet, stray) using eight genera. However, the resulting plot displays not pretty in box shape. The reason is too wide variance of the Y-axis data.

I assume that 1) eliminate Y-axis outlier and 2) Use log transformation with the relative abundance data for scaling would be good solution of this.

What if I have many 0 values when I do trnasformation to log10 ?

Is there any well-used library that for automatically transformation for purpose of this kind of work in R ?

original result : enter image description here

My original R code is here :

library(ggplot2) data <- read.csv("relative abundance raw data (putative pathogen).csv") p<-ggplot(data, aes(x="Genus", y="Relative_abundance", fill="Group")) + geom_boxplot(position = position_dodge(width = 0.8), alpha = 0.8) + labs(title = "Relative Abundance Comparison", x = "Genus", y = "Relative Abundance", fill = "Group") + theme_minimal() + scale_fill_manual(values = c("stray" = "blue", "pet" = "red")) p + geom_jitter(shape=16, position=position_jitter(0.2))

My raw data file can be downloaded here :

https://drive.google.com/file/d/1Dxy2EqqgC2BQK6b92gRHSI5t7DtA29YA/view?usp=sharing

Please help me for making pretty box plot by adjusting y-scale !!

statistics box-plot R scale_adjustment logarithm • 919 views

ADD COMMENT • link updated 11 months ago by Jeremy ▴ 930 • written 11 months ago by ohtang7 ▴ 40

2

Entering edit mode

You can try scale_y_sqrt() instead if you don't like the look of the log10 transformation. As an aside, it doesn't look like your fill variable is working as everything is the same grey...

Finally, I don't think you have enough data to really show that the "outliers" are actually outliers in need of being removed. Yes, they are far outside of the distribution otherwise, but you only have ~50 data points in that genus. Depending on what stats you use, you could check the Residuals vs Leverage diagnostic plots in R to see if they have more support for being removed.

ADD REPLY • link 11 months ago by dthorbur ★ 2.5k

2

Entering edit mode

Please provide data as dput(), not via any random dropbox, that could be anything (also malware, theoretically). If log transformation creates zeros then one typically adds a pseudocounts, like 1 or 0.1 before transformation.

ADD REPLY • link 11 months ago by ATpoint 85k

1

Entering edit mode

I agree with ATpoint - you can replace 0s with .1.

For determining skewness, I like to use the skewness() function in the moments package in R. Since your data appears to be right-skewed, you're right that a log10 transformation might give you a more normal distribution. As dthorbur mentioned, there are other transformations that you could do. Less extreme transformations would include square root, cube root, log2, and natural log. A more extreme transformation would be to take the inverse, although I don't think it would make sense to use that with fold change data. log2 fold change is commonly used in biomedical research.

ADD REPLY • link 11 months ago by Jeremy ▴ 930