I've got bisulfite-sequencing data for two differentiation stages. The raw data was mapped using Bismark. For each CpG site, the methylation ratio was marked as "A/B" (A methylated reads vs B unmethylated reads for this site).
Now I want to compare the overall methylation of a certain region. Assuming this region contains 3 CpG sites, the methylated/unmethylated ratio of each site in the two stages are 45/60, 4/10, 5/15 and 14/65, 3/9, 7/20 respectively.
I think there are two ways to calculate the overall methylation rate of the region:
Method 1: total amount of methylated reads / total number of reads within the region
methylation rate of stage 1 = (45+4+5)/(45+60+4+10+5+15) = 54/139 = 0.39
methylation rate of stage 2 = (14+3+7)/(14+65+3+9+7+20) = 24/118 = 0.20
Method 2: average the methylation rate of each CpG site within the region
methylation rate of stage 1 = (45/(45+60) + 4/(4+10) + 5/(5+15))/3 = 0.32
methylation rate of stage 2 = (14/(14+65) + 3/(3+9) + 7/(7+20))/3 = 0.22
For method 1, I can then calculate the confidence inteval and the P value to know whether 54 out of 139 is significantly different from 24 out of 118.
However, the methylation rate calculated by method 1 is biased toward the site with higher reads coverage.
Method 2 seems to be more robust to indicate the actual methylation rate of the region.
But I don't know which statistics test should I use for method 2 to know the significance.
Thanks in advance.