2.4 years ago by
Freiburg, Germany
There won't be an exact list of CpGs covered by RRBS because it depends on the exact enzymes used and how tightly you perform size selection. I would propose that you perform the following procedure:
- Use biopython to determine all possible fragments generated by the restriction enzymes you'll be using (there are some convenient functions for performing restriction digests on sequences in that package).
- Determine a rough range of sequencable fragments, which will likely be something like 75-500 bases.
- Choose a read length (N), because the results of all of this will be length-dependent.
- For each of the fragments you selected from step 2, write the regions corresponding to the first/last N bases to a file in BED format.
- Load the BED file from step 4 into an interval tree (there might be something in biopython for this, worst case scenario you can use deeptoolsintervals from deepTools).
- Use biopython to iterate over the CpGs and query them for overlaps with the interval from step 5.
- Write output files appropriately
- Compare them to what the EPIC 850K covers.
Note that the EPIC 850K may give a ballpark estimate of all of this in their sales materials. I wouldn't be surprised if the EPIC 850K covers some CpGs that RRBS doesn't.