I just started working with methylation (WGBS) data. I have used Bismark and it generated the methylation of Cs and consecutive Gs. I am guessing this Gs methylation is from Cs from other strand (please correct me if I am wrong). My question is should I consider both while reporting the methylation in a region or should I filter only Cs rows and perform the further analysis?
Thank for your clear answer. I want to know why I have to consider both methylation level (from C and subsequent G) for the analysis. Let's say I want look at the methylation level inside the promoter of a gene, since genes are defined in the strand specific does it not make sense to take only Cs (from all CpGs) from the promoter. Similarly, if the gene is present in negative strand one has to take Gs methylation level which is actually Cs from negative strand.
I am assuming to take both (Cs and subsequent Gs) methylation if I am looking at the methylation level present inside a peak (eg H3K9ac). Because these peaks has modified histones wrapped with both the stands of DNA and hence presence/occupancy of these peaks are affected by both strands.
Transcription factors and other DNA-interacting proteins don't typically interact with a single strand, but rather with major or minor groove of double stranded DNA. It's rather unusual for genes to only be affected by methylation on the same strand as them (not to mention that it's the reverse strand the serves as the template). Further, your coverage for a CpG will be double that of a single C, which greatly aids in statistical power.