Question: Verification of Global Reduction of Epigenetic Factors ?
gravatar for Wet&DryImmunology
3.6 years ago by
Wet&DryImmunology220 wrote:

Hi, Hi, I was trying to investigate the changes of H3K27ac upon conditional KO of gene X.

After mapping and counting the data using "featurecount" from "Rsubread" packages, I used the DEseq2 to perform differential analysis.

Below is the scatter plot (WT vs CKO), I could obtain genomic region with the significantly changed H3K27ac (as highlighted in "red" in the plot), but I also noticed that there is global shift of histone marks.

One possibility I could think of is that this shift might just reflect the fact that the sequence depth is different between WT and CKO samples.

I checked the "regularized log transformation" part of the vigentte as well as the original paper of "DESeq2",

To be honest, I was not able to fully understand the underlying mathematics, but it seems me that the "rlog" transformation has factored in "size factor", or the sequence depth . Even though the main rationale of performing "rlog" transformation is to stabilize the variance across gene with difference means.

So does it mean the shift did reflect the global reduction of H3K27ac upon gene X KO, given that the wet experiments and other data process performed properly? ( I guess even it is a correct interpretation, other independent method might be needed to verify the conclusion.)

Or did I just misunderstand of "rlog" methodology?

Actually I raised this issue in the support forum of "bioconductor" specifically concerning using "rlog" (DESeq2),

my question would be more general here:! If I really suspect there are global reductions of an epigenetic feature, instead of just changes at some specific loci (I think most of analysis of ChIP-seq is based on the assumption that the "overall" level of epigenetic feature are "stable" in one way or another), what would be the proper way to prove or disprove that?

Thanks in advance.

enter image description here

chip-seq next-gen R • 1.1k views
ADD COMMENTlink modified 3.6 years ago by Devon Ryan98k • written 3.6 years ago by Wet&DryImmunology220

Linking to the question posted on Bioconductor support:

ADD REPLYlink written 3.6 years ago by Michael Love2.1k
gravatar for Devon Ryan
3.6 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

If your suspect that there are global changes in a feature (in this case, a histone modification), then you can't rely on the normalization methods built into any RNAseq package. For that, you need to (A) normalize only to non-peak regions or (B) normalize to an ideally off-species spike-in. In the former case, I presume that's what diffbind would do, which is what you should be using anyway.

ADD COMMENTlink written 3.6 years ago by Devon Ryan98k

@Ryan, thank you for your suggestion. I guess (B) the off-specifies spike-in is the definitive solution. For (A) could you explain a little more? it did not seem to be obvious to me why normalized to non peak region would address the issue. I did some homework about "Diffbind", it seems to me that "Diffbind" also relied on "DEseq2" (or "EdgeR"), to compare experiment group vs control group, dba.count documentation says: "Note that all raw read counts are maintained for use by dba.analyze, regardless of how this is set", so the default workflow of Diffbind seems to use raw count. (I guess this is because DEseq2 requires raw counts without any normalization as inputs). Without consideration of performing differential analysis, dba.count does provide many options to normalized count data, but I could not find any argument performing "normalize to non-peak regions"... If I sloppily missed something, please let me know...

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Wet&DryImmunology220

Coverage in non-peak regions shouldn't be affected by global changes in the signal, which is what causes issues in all of the normal normalization methods.

I've never dug into diffbind to see if this is what it's doing. If it's not, then randomly select 10000 or so non-peak regions, count in those and put them into DESeq2. Get the scaling factor for that and then apply it to your peak counts. This isn't perfect, since one could consider cases where the underlying histone modification level is affecting the background coverage rate, but it'll be better than what you're currently doing.

ADD REPLYlink written 3.6 years ago by Devon Ryan98k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1657 users visited in the last hour