Question: Define methylation level within X nucleotides
gravatar for thefirstrealace
3.2 years ago by
thefirstrealace30 wrote:

Hello everybody,

i have a bed file containing methylation levels at certain coordinates which has been generated from BS-Seq data of human spleen cells. Here is a small part of its content:

chr1    10468    id-20250951    0.773585
chr1    10469    id-20250952    0.773585
chr1    10470    id-20250953    0.750000
chr1    10471    id-20250954    0.750000
chr1    10483    id-20250955    0.918033
chr1    10484    id-20250956    0.918033
chr1    10488    id-20250957    0.830769
chr1    10489    id-20250958    0.830769
chr1    10492    id-20250959    0.805556
chr1    10493    id-20250960    0.805556
chr1    10496    id-20250961    0.896104
chr1    10497    id-20250962    0.896104

I need to calculate the methylation levels within 20 nucleotide bins along a certain part of the genome. Lets consider the first six entries (coordinate 1068 - 1084) for our first 20 nt bin: How is the methylation level defined? Do i have to sum up the first 6 methylation values and devide by 20 or by 6?

I also heard from a friend, that it might be defined as the sum of the first six methylation values divided by the total number of Cytosines within the 20 nt bin.

Can someone please shed light on this?

Best regards



ADD COMMENTlink modified 3.2 years ago by Devon Ryan88k • written 3.2 years ago by thefirstrealace30
gravatar for Devon Ryan
3.2 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:
  1. That's not a BED file, it's some custom format. You can likely make a bedGraph file out of it with awk '{OFS="\t"; $3 = $2+1; print $0}' input.bed > output.bed
  2. Don't listen to your friend, he/she is wrong.

With a bedGraph file, you can make a bigWig file and then use either bigWigSummary or pyBigWig (if you prefer scripting in python). Either of these can directly output the average methylation of 20 base adjacent bins in some region.

For what it's worth, the average methylation is the sum divided by the number of entries present in a region. If one were to include positions for which there's no entry then one would be saying that such positions are 0% methylated. This would obviously be a terrible idea.


ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Devon Ryan88k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 717 users visited in the last hour