Question: DiffBind - normalization methods with biologically-relevant differences in signal-to-noise
0
gravatar for reskejak
10 months ago by
reskejak10
Michigan State University
reskejak10 wrote:

I have been using DiffBind for differential-accessibility analysis with ATAC data and encountered the seemingly infamous normalization issue: our results are very different when normalizing by full library read depth as opposed to depth of reads within consensus peaks (from my understanding of how bFullLibrarySize param setting affects count normalization).

Below are two MA plots from bFullLibrarySize=TRUE and =FALSE using DESeq2, and we can see that the results are quite different. It would appear the background density normalization suggests to utilize the =FALSE method. We do observe variability of signal-to-noise between samples (FRiP ranges from 0.08 to 0.30 between samples). However, this may be indicative of the biology between experimental groups. This variability is also a reason why we have not relied on edgeR calculations, as I recall reading that Rory stated it is not an appropriate method for high inter-sample signal-to-noise variability within experiments. Does anyone have suggestions for interpreting these results, or which to favor?

I have a number of other flow cells which we also see variability from the outputs of bFullLibrarySize=TRUE vs. FALSE, so we are seeking insight for which to implement in scenarios with varying signal-to-noise intensities. I'm also planning to try csaw in the near future, so I can compare results to that method as well.

bFullLibrarySize=TRUE bFullLibrarySize=TRUE

bFullibrarySize=FALSE bFullLibrarySize=FALSE

ADD COMMENTlink modified 10 months ago by Devon Ryan89k • written 10 months ago by reskejak10
2
gravatar for Devon Ryan
10 months ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

To be frank, neither of the options presented by diffBind are very robust, we've switched to the CSAW package for analyzing differential accessibility, since it's more flexible and provides more robust methods.

The assumptions behind the bFullLibrarySize option are as follows:

  • True: There is no great difference in experimental efficiency between groups (and ideally not between samples, but as long as the inter-sample variability is similar within groups you should be OK).
  • False: There is no global change in signal within peaks between groups.

If you have a case where there's an efficiency difference between groups AND you suspect there may be a global shift in accessibility then neither of the settings are appropriate and you'll need to come up with a different way to normalize the samples (e.g., spiking in a foreign DNA source in all of the libraries and using that for normalization).

You'll need to assess for yourself whether any of the above assumptions fit your biological experiment.

ADD COMMENTlink written 10 months ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 720 users visited in the last hour