Question

Verification of Global Reduction of Epigenetic Factors ?

1

Entering edit mode

6.9 years ago

Wet&DryImmunology ▴ 240

Hi, Hi, I was trying to investigate the changes of H3K27ac upon conditional KO of gene X.

After mapping and counting the data using "featurecount" from "Rsubread" packages, I used the DEseq2 to perform differential analysis.

Below is the scatter plot (WT vs CKO), I could obtain genomic region with the significantly changed H3K27ac (as highlighted in "red" in the plot), but I also noticed that there is global shift of histone marks.

One possibility I could think of is that this shift might just reflect the fact that the sequence depth is different between WT and CKO samples.

I checked the "regularized log transformation" part of the vigentte as well as the original paper of "DESeq2",

To be honest, I was not able to fully understand the underlying mathematics, but it seems me that the "rlog" transformation has factored in "size factor", or the sequence depth . Even though the main rationale of performing "rlog" transformation is to stabilize the variance across gene with difference means.

So does it mean the shift did reflect the global reduction of H3K27ac upon gene X KO, given that the wet experiments and other data process performed properly? ( I guess even it is a correct interpretation, other independent method might be needed to verify the conclusion.)

Or did I just misunderstand of "rlog" methodology?

Actually I raised this issue in the support forum of "bioconductor" specifically concerning using "rlog" (DESeq2),

my question would be more general here:! If I really suspect there are global reductions of an epigenetic feature, instead of just changes at some specific loci (I think most of analysis of ChIP-seq is based on the assumption that the "overall" level of epigenetic feature are "stable" in one way or another), what would be the proper way to prove or disprove that?

Thanks in advance.

enter image description here

ChIP-Seq next-gen R • 1.9k views

ADD COMMENT • link updated 6.9 years ago by Devon Ryan 104k • written 6.9 years ago by Wet&DryImmunology ▴ 240

2

Entering edit mode

Linking to the question posted on Bioconductor support:

https://support.bioconductor.org/p/96381/#96405

ADD REPLY • link 6.9 years ago by Michael Love ★ 2.6k

score 1 · Answer 1 · 2017-05-29

1

Entering edit mode

6.9 years ago

Devon Ryan 104k

If your suspect that there are global changes in a feature (in this case, a histone modification), then you can't rely on the normalization methods built into any RNAseq package. For that, you need to (A) normalize only to non-peak regions or (B) normalize to an ideally off-species spike-in. In the former case, I presume that's what diffbind would do, which is what you should be using anyway.

ADD COMMENT • link 6.9 years ago by Devon Ryan 104k

0

Entering edit mode

@Ryan, thank you for your suggestion. I guess (B) the off-specifies spike-in is the definitive solution. For (A) could you explain a little more? it did not seem to be obvious to me why normalized to non peak region would address the issue. I did some homework about "Diffbind", it seems to me that "Diffbind" also relied on "DEseq2" (or "EdgeR"), to compare experiment group vs control group, dba.count documentation says: "Note that all raw read counts are maintained for use by dba.analyze, regardless of how this is set", so the default workflow of Diffbind seems to use raw count. (I guess this is because DEseq2 requires raw counts without any normalization as inputs). Without consideration of performing differential analysis, dba.count does provide many options to normalized count data, but I could not find any argument performing "normalize to non-peak regions"... If I sloppily missed something, please let me know...

ADD REPLY • link 6.9 years ago by Wet&DryImmunology ▴ 240

0

Entering edit mode

Coverage in non-peak regions shouldn't be affected by global changes in the signal, which is what causes issues in all of the normal normalization methods.

I've never dug into diffbind to see if this is what it's doing. If it's not, then randomly select 10000 or so non-peak regions, count in those and put them into DESeq2. Get the scaling factor for that and then apply it to your peak counts. This isn't perfect, since one could consider cases where the underlying histone modification level is affecting the background coverage rate, but it'll be better than what you're currently doing.

ADD REPLY • link 6.9 years ago by Devon Ryan 104k