Typical cutoffs for hyper- and hypo-methylation?
Hi all,

I'm doing an MBD-seq experiment where I've got 4 samples: Sample A enriched, Sample A input, Sample B enrich and sample B input. After running the R package MEDIPS to obtain RRKM values, I'd like to determine hyper- and hypo-methylation status of Sample A vs Sample B 5000K upstream of gene TSSs. Would the following sound like a reasonable definition of hyper- and hypo-methylation in this context?

Hypermethylation: ((Sample A enrich RPKM)/(Sample A input RPKM))/((Sample B enrich RPKM)/(Sample B input RPKM)) > 2

Hypomethylation: ((Sample A enrich RPKM)/(Sample A input RPKM))/((Sample B enrich RPKM)/(Sample B input RPKM)) < 0.5

No, I recommend against this strategy for the following reason: NGS-based experiments always suffer from the mean-variance dependency, which means that fold changes tend to be (false-positively) large when the read counts (or here RPKM) are low (e.g. due to low sequencing depth at the specific region (for whatever reason: poor representation in the library, GC content etc.). Therefore when using naive fold changes like RPKM_condition1/RPKM_condition2 your analysis will contain a high number of false-positives. Therefore it is generally a better strategy to use a proper statistical framework which takes this into account and also controls for false discovery rate (multiple testing problem). Medips has a function for this as far as I know (using edgeR internally), please check the manual for "differential analysis". Hope you have experimental replicates.

Thanks, those are all valid points. Yep, I'm using MEDIPS + edgeR to calculate windows with significant methylation. However, for a crude (and non-statistical) definition of 5K bp upstream of TSS hyper- and hypo-methylation, would the above definition of > 2 and < 0.5 be totally incorrect, or is it just unconventional?

Is there a more robust definition by which one could define hyper- and hypo-methylation 5k upstream of TSS?

Edit: So what I mean to say is, I'm not necessarily looking for a definition of regions of significant hyper/hypo methylation, but rather just an operational definition of with regards to X bp upstream of a TSS.