You could winsorize your data based on quantiles. This means you first identify values that are e.g. greater than 99% of the data and then replace the values of these "larger data" by the value of this 99th percentile. Example code for some dummy numeric data below. This assumes that you first imported your data into R and now have it as either a matrix, data frame or similar container with numeric values:

```
## example numeric data
numeric.data <- rnorm(1000, 100,25)
## plot raw data
par(mfrow=c(1,2), bty="L")
boxplot(numeric.data, ylim=c(0, 200))
## identify the 99th percentile
quant99perc <- quantile(numeric.data, .99)
## replace values larger than that with the value of the 99th percentile
numeric.data[which(numeric.data > quant99perc)] <- as.numeric(quant99perc)
## plot winsorized data
boxplot(numeric.data, ylim=c(0, 200))
```

As you will see in the plot the outliers have been removed by scaling them down without changing all the other values.

Here a suggestion on how to import bigwig data into R:

•

link
modified 10 months ago
•
written
10 months ago by
ATpoint ♦ **44k**
Thanks for the replies. I have already removed pcr duplicates from the bam files but I still get outliers in some areas in both control and treatment samples. These spikes are up to 20 or 30 times higher than the average coverage in most areas.

I think I will try removing the top 1 percentile in R. Any suggestions for an R package the allows importing bigwig files as a matrix?

Thanks

20Yes,

`rtracklayer`

, I updated my answer.44k