How can I adjust Y-axis scale when making relative abundance box plot ?
0
1
Entering edit mode
5 months ago
ohtang7 ▴ 40

I am creating a relative abundance boxplot comparing two groups (pet, stray) using eight genera. However, the resulting plot displays not pretty in box shape. The reason is too wide variance of the Y-axis data.

I assume that 1) eliminate Y-axis outlier and 2) Use log transformation with the relative abundance data for scaling would be good solution of this.

What if I have many 0 values when I do trnasformation to log10 ?

Is there any well-used library that for automatically transformation for purpose of this kind of work in R ?

original result : enter image description here

My original R code is here :

library(ggplot2) data <- read.csv("relative abundance raw data (putative pathogen).csv") p<-ggplot(data, aes(x="Genus", y="Relative_abundance", fill="Group")) + geom_boxplot(position = position_dodge(width = 0.8), alpha = 0.8) + labs(title = "Relative Abundance Comparison", x = "Genus", y = "Relative Abundance", fill = "Group") + theme_minimal() + scale_fill_manual(values = c("stray" = "blue", "pet" = "red")) p + geom_jitter(shape=16, position=position_jitter(0.2))

My raw data file can be downloaded here :

https://drive.google.com/file/d/1Dxy2EqqgC2BQK6b92gRHSI5t7DtA29YA/view?usp=sharing

Please help me for making pretty box plot by adjusting y-scale !!

statistics box-plot R scale_adjustment logarithm • 641 views
ADD COMMENT
2
Entering edit mode

You can try scale_y_sqrt() instead if you don't like the look of the log10 transformation. As an aside, it doesn't look like your fill variable is working as everything is the same grey...

Finally, I don't think you have enough data to really show that the "outliers" are actually outliers in need of being removed. Yes, they are far outside of the distribution otherwise, but you only have ~50 data points in that genus. Depending on what stats you use, you could check the Residuals vs Leverage diagnostic plots in R to see if they have more support for being removed.

ADD REPLY
2
Entering edit mode

Please provide data as dput(), not via any random dropbox, that could be anything (also malware, theoretically). If log transformation creates zeros then one typically adds a pseudocounts, like 1 or 0.1 before transformation.

ADD REPLY
1
Entering edit mode

I agree with ATpoint - you can replace 0s with .1.

For determining skewness, I like to use the skewness() function in the moments package in R. Since your data appears to be right-skewed, you're right that a log10 transformation might give you a more normal distribution. As dthorbur mentioned, there are other transformations that you could do. Less extreme transformations would include square root, cube root, log2, and natural log. A more extreme transformation would be to take the inverse, although I don't think it would make sense to use that with fold change data. log2 fold change is commonly used in biomedical research.

ADD REPLY

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6