I am trying to calculate the bisulfite conversion rates for whole genome bisulfite sequencing dataset.
The bisulfite conversion rate of each base (non-CpG) can be calculated as T / (T + C) * 100, where T is thymine and C is cytosine read numbers on that base.
Below is from the aligned (by bismark) and methylation-called (by methylKit) file of one sample (CHH context).
chrBase chr base strand coverage freqC freqT
scaffold1.1005 scaffold1 1005 F 12 0.00 100.00
scaffold1.1006 scaffold1 1006 F 13 0.00 100.0
scaffold1.1016 scaffold1 1016 F 17 0.00 100.00
scaffold1.1024 scaffold1 1024 F 18 0.00 100.00
To calculate overall conversion rate of this sample, i think i should calculate below. 1. all C = sum of (freqC * coverage) in this sample 2. all T = sum of (freqT * coverage) in this sample 3. overall conversion rate = all T / (all C + all T)
Is it correct?
Also, what is the acceptable range of non-conversion rates?
Thank you very much!