Question: How can I remove the outliers from a boxplot and fill the groups?
0
gravatar for dpc
7 weeks ago by
dpc140
India
dpc140 wrote:

I have generated these two boxplots: two boxplots.

But I am unable to remove the points and the outliers. I already have used outliers.shape = NA which didn't work.

I also want to fill the groups like this image: enter image description here

Can anyone please tell me how can I do that?

here's my code:

p <- plot_richness (physeq_rarefied, x="type", color = "type", measures=c("Shannon", "Observed")) +
stat_compare_means(method = "wilcox.test") +
geom_boxplot() +
labs(x= "Sample types", y= "Alpha Diversity Measure", 
title = "Alpha diversity of control and test samples")

Thanks, dpc

statistics R • 229 views
ADD COMMENTlink modified 7 weeks ago by Alex Reynolds30k • written 7 weeks ago by dpc140
2

outlier.shape = NA for ggplot to hide outliers. Try changing notch.width values.

library(ggplot2)
ggplot(iris, aes(Species, Sepal.Length, fill=Species)) +
    geom_boxplot(
        outlier.shape = NA,
        notch = T,
        notchwidth = 0.10)

Instead of box plots, try beeswarm or violin plots with jitter.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by cpad011214k
2
gravatar for antonioggsousa
7 weeks ago by
antonioggsousa1.4k
antonioggsousa1.4k wrote:

Hi @dpc,

To have a notched boxplot (I believe this is the right term for the figure that you want to make: https://sites.google.com/site/davidsstatistics/home/notched-box-plots ) just add the following option to your code between the geom_boxplot() function:

geom_boxplot(notch = TRUE)

Then, after run your code, you can do:

p$layers[1] <- NULL

This will remove the first ggplot layer, that corresponds to the geom_point().

I hope this helps,

António

ADD COMMENTlink written 7 weeks ago by antonioggsousa1.4k

Thanks Sir, @Antonio. Yes, I am aware about the notched boxplot :) . Only, concern was "fill" the types and remove the outliers. However, I will try as you have suggested.

Thanks a lot, dpc

ADD REPLYlink written 7 weeks ago by dpc140

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.

Upvote|Bookmark|Accept

ADD REPLYlink written 7 weeks ago by RamRS30k

In this way all the dots disappeared except outliers.

Like this image: enter image description here

Also, how can I fill the boxes with colours?

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by dpc140
1

Hi,

Using the same instructions that I gave you, but substitute the line with geom_boxplot() function with the following:

geom_boxplot(aes(fill = type), notch = TRUE, outliers.shape = NA)

Let me know if worked.

António

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by antonioggsousa1.4k

This is my code now:

p <- plot_richness(physeq_rarefied, x="type", color = "type", measures=c("Shannon", "Observed")) + 
                      stat_compare_means(method = "wilcox.test") +
                      geom_boxplot(aes(fill = type), notch = TRUE, outliers.shape = NA) + 
                      scale_fill_manual(values = c("hotpink", "skyblue"))+
                      labs(x= "Sample types", y= "Alpha Diversity Measure", 
                      title = "Alpha diversity of control and test samples") 
    p
    p$layers[1] <- NULL
    p

And this is output. Outliers still exist:

enter image description here

EDIT: However, outlier.size = -1 replacing outliers.shape = NA removes all the outliers. Here's the output:

enter image description here

Thanks, dpc

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by dpc140
1

try geom_boxplot2 funtion from Ipaper (https://github.com/kongdd/Ipaper/) for boxplots without outlier.

Example code:

library(Ipaper)
library(ggplot2)

ggplot(iris[,c(1,5)], aes(Species,Sepal.Length))+
    geom_boxplot2()
ADD REPLYlink written 7 weeks ago by cpad011214k
1
gravatar for Alex Reynolds
7 weeks ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

If you want clarity about and control over how you deal with outliers, without dealing with the "blackbox" that other people's code provides, there are a couple common approaches you can use to do this yourself:

  1. Trimming
  2. Winsorization

In both approaches, you specify a percentage cutoff. You sort the data from low to high, and you take some percentage of values from the full set of data and deal with them, depending on the method.

For instance, if you have 1000 points, with a 10% cutoff, the values you deal with are the top 500 and bottom 500 values. Each of these subsets makes up 5% of the total dataset — or 10%, in total.

With the trimming method, any value from your dataset which falls in this cutoff is removed. If you start with 1000 values and have a 10% cutoff, you end up with a dataset containing 900 values.

With the Winsorization approach, unlike trimming, any value which falls in this cutoff is not removed, but is instead replaced with the next lowest or highest value. You still end up with 1000 values.

Both approaches change your distribution, but they filter outliers.

How many outliers are removed depends entirely on your choice of cutoff.

In R, you could trim simply by excising a number of rows from a dataset that meet the criteria (e.g., using a 10% cutoff):

q <- quantile(x,  probs = c(5, 95)/100)
trimmed_x <- x[x>q[1] & x<q[2]]

In R, you could Winsor with the winsorize function from statar (e.g., using a 10% cutoff):

winsorized_x <- winsorize(x, probs = c(0.05, 0.95))

Then plot trimmed_x or winsorized_x, as if it was your original dataset.

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by Alex Reynolds30k

Another approach would be to calculate box plot stats,use mean (between upper and lower values of the box) to equally trim on either side of the box, multiply the limits with appropriate factor and use cartesian coordinates in ggplot.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by cpad011214k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1642 users visited in the last hour