Question: Quantifying DNA methylation from Bisulfite-Seq Data
gravatar for Ali
6.4 years ago by
Iran, Islamic Republic Of
Ali140 wrote:

Sodium Bisulfite Treatment is the gold standard for measuring the level of DNA methylation, it converts the unmethylated cytosines to uracils - which convert to thymines after PCR - but keeps methylated cytosines unchanged.

Here is my question: Suppose the DNA fragment (Methylated cytosines are in upper case: C, and unmethylated cytosines in lower case: c)

5' ACGATGc 3' (Top strand)                          3' TGCTAcG 5' (Bottom strand)

After bisulfite treatment we will have:

5' ACGATGT 3' (Top strand)                         3' TGCTATG 5' (Bottom strand)

And after PCR there each of top and bottom strands will be changed to a double-stranded DNA like below. The strands 1 and 2 are complementary, made from the Top Strand above, and strands 3 and 4 are made from the bottom strand

5' ACGATGT 3 (1, Top strand, forward)            3' TGCTATG 5' (3, Bottom strand, forward)

3' TGCTACA 5' (2, Top strand, reverse)            5' ACGATAC 3' (4, Bottom strand, reverse)

Now we have 4 different strands which align to the same genomic location (either forward or reverse strands). But the problem is that each of them makes a different measurement of DNA methylation. For instance in the sequence (1): ACGATGT the the last base is T meaning an unmethylated cytosines, which is correct. However in the strand (4) ACGATAC the last letter is C that means a methylated cytosine, which is a wrong assumption.

How to infer the correct methylation status of each base according to the 4 different reads?



ADD COMMENTlink modified 6.4 years ago by Devon Ryan98k • written 6.4 years ago by Ali140
gravatar for Devon Ryan
6.4 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

This in the reason that those of us who have written BS-seq aligners have more grey hair than we should.

The answer to this comes from the orientation of a read after alignment and what conversions one does to it and to the genome to get it to align. In short, if you in silico convert a read C->T (I'll just use single-end examples) and it aligns with a forward orientation to a C->T converted genome, then it originated from the original top strand. If it aligned with a reverse orientation to the G->A converted genome, then it came from the original bottom strand. The other two strands (we generally call these "complementary to the original top" and "complementary to the original bottom") follow similarly. I recall that Felix Krueger has a nice illustration of all of this in the RRBS guide for Bismark (it's the same for WGBS and RRBS, have a look starting at page 6 I think).

Once you know from which strand a read arose, you can then determine what residue it's giving information for.

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Devon Ryan98k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1656 users visited in the last hour