Hi,
I've got bisulfite-sequencing data for two differentiation stages. The raw data was mapped using Bismark. For each CpG site, the methylation ratio was marked as "A/B" (A methylated reads vs B unmethylated reads for this site).
Now I want to compare the overall methylation of a certain region. Assuming this region contains 3 CpG sites, the methylated/unmethylated ratio of each site in the two stages are 45/60, 4/10, 5/15 and 14/65, 3/9, 7/20 respectively.
I think there are two ways to calculate the overall methylation rate of the region:
Method 1: total amount of methylated reads / total number of reads within the region
methylation rate of stage 1 = (45+4+5)/(45+60+4+10+5+15) = 54/139 = 0.39
methylation rate of stage 2 = (14+3+7)/(14+65+3+9+7+20) = 24/118 = 0.20
Method 2: average the methylation rate of each CpG site within the region
methylation rate of stage 1 = (45/(45+60) + 4/(4+10) + 5/(5+15))/3 = 0.32
methylation rate of stage 2 = (14/(14+65) + 3/(3+9) + 7/(7+20))/3 = 0.22
For method 1, I can then calculate the confidence inteval and the P value to know whether 54 out of 139 is significantly different from 24 out of 118.
However, the methylation rate calculated by method 1 is biased toward the site with higher reads coverage.
Method 2 seems to be more robust to indicate the actual methylation rate of the region.
But I don't know which statistics test should I use for method 2 to know the significance.
Please help.
Thanks in advance.
Thanks for your quick and detailed reply, Devon. This is really irradiative to me. I've read the mannual of BiSeq and methylKit. MethylKit seems focusing more on the comparison between simple sites.
BiSeq could give what I need. However, what I have is the whole genome BS-seq data instead of RRBS. I think it's still OK as I've got the CpG methylation counts from Bismark, though the depth might not as deep as RRBS. Am I right?
Yeah, BiSeq will work for WGBS as well. You probably want to filter out sites without at least 4x coverage. Your other option for WGBS would be BSseq (mentioned above) and just doing the paired T-test in regions of interest (I'm not a bit fan of the smoothing methods myself, but that's another story), should you have some.
Thanks indeed!