Question: Removing outliers from bigwig files
0
gravatar for srhic
10 months ago by
srhic20
srhic20 wrote:

Hello,

I have some atac-seq and chip-seq data which I have processed using deeptools to generate normalized bigwig files. I am now using these bigwig file to plot profiles over regions of interest but am having problems with outliers distorting the plots in certain regions. I was wondering if there is any tool that can be used to remove outlier regions from bigwig files?

Thanks

chip-seq next-gen deeptools • 288 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by srhic20

Thanks for the replies. I have already removed pcr duplicates from the bam files but I still get outliers in some areas in both control and treatment samples. These spikes are up to 20 or 30 times higher than the average coverage in most areas.

I think I will try removing the top 1 percentile in R. Any suggestions for an R package the allows importing bigwig files as a matrix?

Thanks

ADD REPLYlink written 10 months ago by srhic20

Yes, rtracklayer, I updated my answer.

ADD REPLYlink written 10 months ago by ATpoint44k
0
gravatar for Carlo Yague
10 months ago by
Carlo Yague5.5k
Canada
Carlo Yague5.5k wrote:

How do you define outliers ? regions with high coverage ? If you were to filter out those regions, then you might remove bound regions (if we talk about ChIP-seq) or other regions of interest. So I do not think this is a good strategy.

Alternatively, you might:

  1. [quickfix] Use a fixed y scale for visualization (instead of going on auto-min-max). So even if the coverage reaches extremely high level, it will be capped at your chosen maximum.

  2. [slower but probably better] Go back to the .bam alignment files, and mark and filter out duplicate reads. PCR or optical duplicates can cause reads pile-up artefacts. Once done, re-create the bigwig files.

ADD COMMENTlink modified 10 months ago • written 10 months ago by Carlo Yague5.5k
0
gravatar for ATpoint
10 months ago by
ATpoint44k
ATpoint44k wrote:

You could winsorize your data based on quantiles. This means you first identify values that are e.g. greater than 99% of the data and then replace the values of these "larger data" by the value of this 99th percentile. Example code for some dummy numeric data below. This assumes that you first imported your data into R and now have it as either a matrix, data frame or similar container with numeric values:

## example numeric data
numeric.data <- rnorm(1000, 100,25)

## plot raw data
par(mfrow=c(1,2), bty="L")
boxplot(numeric.data, ylim=c(0, 200))

## identify the 99th percentile
quant99perc <- quantile(numeric.data, .99)

## replace values larger than that with the value of the 99th percentile
numeric.data[which(numeric.data > quant99perc)] <- as.numeric(quant99perc)

## plot winsorized data
boxplot(numeric.data, ylim=c(0, 200))

As you will see in the plot the outliers have been removed by scaling them down without changing all the other values.

enter image description here

Here a suggestion on how to import bigwig data into R:

ADD COMMENTlink modified 10 months ago • written 10 months ago by ATpoint44k

Thanks! I will try this

ADD REPLYlink written 10 months ago by srhic20

I tried this and it worked perfectly.

However, one confusion I have is if I should normalize the bigwig files for each condition separately. For example, if I have one control and multiple treatment conditions, should I use the 99 percentile of each treatment to scale down its outliers or scale down all treatment conditions to the 99 percentile of my control sample. I am scaling everything separately for now but was wondering which is the better way.

ADD REPLYlink written 10 months ago by srhic20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1087 users visited in the last hour