2.4 years ago by
There won't be an exact list of CpGs covered by RRBS because it depends on the exact enzymes used and how tightly you perform size selection. I would propose that you perform the following procedure:
- Use biopython to determine all possible fragments generated by the restriction enzymes you'll be using (there are some convenient functions for performing restriction digests on sequences in that package).
- Determine a rough range of sequencable fragments, which will likely be something like 75-500 bases.
- Choose a read length (N), because the results of all of this will be length-dependent.
- For each of the fragments you selected from step 2, write the regions corresponding to the first/last N bases to a file in BED format.
- Load the BED file from step 4 into an interval tree (there might be something in biopython for this, worst case scenario you can use deeptoolsintervals from deepTools).
- Use biopython to iterate over the CpGs and query them for overlaps with the interval from step 5.
- Write output files appropriately
- Compare them to what the EPIC 850K covers.
Note that the EPIC 850K may give a ballpark estimate of all of this in their sales materials. I wouldn't be surprised if the EPIC 850K covers some CpGs that RRBS doesn't.