Question: Calculating Fractional Methylation Of Cpg Site From Bed Files Containing Read Counts
gravatar for akz99
6.2 years ago by
akz990 wrote:

Hi, I have a bunch of RRBS methylation files that count the number of reads that map to a particular location on the chromosome and the percentage of the reads that show that the site is methylated. How do I translate this information to a fractional methylation file for each CpG locus?

For example, here is one of the files I obtained from the UCSC genome browser for Hela:

Chromosome Start Pos End Pos ....................................................... .......# reads %reads methylated

chr1 100503660 100503661 SL657_RRBS 65 + 100503660 100503661 0,255,0 65 3

chr1 100503676 100503677 SL657_RRBS 65 + 100503676 100503677 0,255,0 65 3

chr1 100503677 100503678 SL657_RRBS 66 - 100503677 100503678 0,255,0 66 0

chr1 100503679 100503680 SL657_RRBS 65 + 100503679 100503680 0,255,0 65 3

chr1 100503680 100503681 SL657_RRBS 66 - 100503680 100503681 0,255,0 66 2

chr1 100503683 100503684 SL657_RRBS 65 + 100503683 100503684 0,255,0 65 0

The important parts of the data are the last two columns where the last column represents the percentage of the reads that are methylated and the second to last column represents the number of reads mapped to that position on the chromosome. Suppose that I know there is a CpG site from position 100503676-100503681, how would I figure out the fractional methylation of this site (where the fractional methylation is what percentage of the SITE is methylated, not what percentage of the reads at the site is methylated). Not sure whether to set a threshold above which the %reads methylated corresponds to "yes the position is methylated" or do this with some other method.

ADD COMMENTlink modified 3.1 years ago by Biostar ♦♦ 20 • written 6.2 years ago by akz990

You could probably modify the merge_CpGs program from bison to convert that to approximate per-CpG metrics. It's a bit annoying that there are likely CHG/CHH and CpG sites mixed together, which you would need to not write out. The original program takes as input a bedGraph with columns containing read counts, but you can calculate the methylated read count easily enough.

It might be easier to write a quick python version to just do what you want, though :oP

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by Devon Ryan94k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1567 users visited in the last hour