If you want clarity about and control over how you deal with outliers, without dealing with the "blackbox" that other people's code provides, there are a couple common approaches you can use to do this yourself:

- Trimming
- Winsorization

In both approaches, you specify a percentage cutoff. You sort the data from low to high, and you take some percentage of values from the full set of data and deal with them, depending on the method.

For instance, if you have 1000 points, with a 10% cutoff, the values you deal with are the top 500 and bottom 500 values. Each of these subsets makes up 5% of the total dataset — or 10%, in total.

With the *trimming* method, any value from your dataset which falls in this cutoff is removed. If you start with 1000 values and have a 10% cutoff, you end up with a dataset containing 900 values.

With the *Winsorization* approach, unlike trimming, any value which falls in this cutoff is not removed, but is instead replaced with the next lowest or highest value. You still end up with 1000 values.

Both approaches change your distribution, but they filter outliers.

How many outliers are removed depends entirely on your choice of cutoff.

In R, you could *trim* simply by excising a number of rows from a dataset that meet the criteria (e.g., using a 10% cutoff):

```
q <- quantile(x, probs = c(5, 95)/100)
trimmed_x <- x[x>q[1] & x<q[2]]
```

In R, you could *Winsor* with the `winsorize`

function from `statar`

(e.g., using a 10% cutoff):

```
winsorized_x <- winsorize(x, probs = c(0.05, 0.95))
```

Then plot `trimmed_x`

or `winsorized_x`

, as if it was your original dataset.

`outlier.shape = NA`

for ggplot to hide outliers. Try changing`notch.width`

values.Instead of box plots, try beeswarm or violin plots with jitter.

14k