Why is sliding window applied to CpG methylation level calculation ?
1
0
Entering edit mode
7.7 years ago
hxlei613 ▴ 100

In this article : The DNA methylation landscape of human early embryos,the author mentioned 100-bp-tile-based DNA methylation calling algorithm (they used RRBS to detect 5mC/5hmc).

The algorithm is described like this: first,genome is binned into consecutive 100-bp tiles.The number of reported C, divided by the total number of reported C and T captured in the 100-bp tiles,is interpreted as the 100-bp-tile averaged DNA methylation level.The DNA methylation level of each sample is the average of the 100-bp tiles.

Why can't we just average every methylated C level ? What's the advantage of sliding window ?

Thank you :)

methylation • 3.2k views
ADD COMMENT
1
Entering edit mode
7.7 years ago
natasha.sernova ★ 4.0k

It's a tradition.

See this paper:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3592415/

“Sliding window is a traditional method for pre-defined regions that are arbitrarily chosen and not taken the actual methylation status of CpGs into consideration.”

or this one:

https://www.bioconductor.org/packages/release/bioc/vignettes/MethTargetedNGS/inst/doc/MethTargetedNGS.pdf

1.3 Methylation Entropy chapter, there is also sliding window used.

"This function return vector of methylation entropy values using sliding window of 4."

ADD COMMENT
0
Entering edit mode

I found the BSmooth (http://www.ncbi.nlm.nih.gov/pubmed/23034175) paper provides a justification for the use of smoothing:

This has led most WGBS studies to employ a high coverage design since even 30× coverage yields standard errors as large as 0.09. However, various authors have noted that methylation levels are strongly correlated across the genome [24,25]. Furthermore, functionally relevant findings are generally associated with genomic regions rather than single CpGs, either CpG islands [26], CpG island shores [27], genomic blocks [1], or generic 2 kb regions [3].

They then concluded the following:

Using this method [BSmooth] on data with 4× coverage, we achieved precision comparable to deeper coverage without smoothing.

So my guess is that one answer could be that smoothing/windows allowed lower coverage sequencing through still having low standard errors associated with the (average/smoothed) DNA methylation level. This is of course at the cost of resolution in resolving individual CpGs.

ADD REPLY
0
Entering edit mode

My guess is that they once had a dataset with either low coverage or a lot of noise. The sliding window would allow you to handle that and still assign values to focal regions/points. There's no other good reason that I know of to do this and it's not something I would personally do by default.

ADD REPLY

Login before adding your answer.

Traffic: 2601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6