Question

Understanding 'destrand' option in Bioconductor MethylKit

0

Entering edit mode

3.2 years ago

eb13 ▴ 20

Hi all,

I'm working through some DNA methylation data generated by RRBS and I'm trying to figure out what the option 'destrand' means in Bioconductor's methylkit. The documentation states that "Setting destrand=TRUE (the default is FALSE) will merge reads on both strands of a CpG dinucleotide. This provides better coverage, but only advised when looking at CpG methylation (for CpH methylation this will cause wrong results in subsequent analyses)." How is it possible to merge reads from opposite strands? If there is a C present on + strand at a particular location, the - strand at that same location would have a G and would thus be uninformative for methylation analyses, right?

Maybe I'm just understanding this incorrectly - could someone help explain?

Thank you!

bioconductor methylkit strand • 1.6k views

ADD COMMENT • link 3.2 years ago by eb13 ▴ 20

score 1 · Answer 1 · 2021-02-03

1

Entering edit mode

3.2 years ago

Papyrus ★ 2.9k

I believe this refers to merging the counts of the cytosines for that CpG site. Both the C at the + strand, and the other C at the - strand (minus 1 position) belong to the same CpG site. They are not 2 different CpG sites (you could look at the site from the perspective of the "+" strand or of the "-" strand, both in the in 5'->3' direction, and see the CpG site). You can check, after collapsing, that the counts for Cs and Ts (meth and unmeth) match the sum of the counts for those 2 individual cytosines.

ADD COMMENT • link 3.2 years ago by Papyrus ★ 2.9k

0

Entering edit mode

Thanks for your reply! This makes sense but I am still unclear about the best way to deal with this option... It seems like merging reads into a single CpG would be good if there is no hemi-methylation and might even be necessary so you don't inflate your sample size with what are essentially duplicate C's.... so in the case of CpGs is it actually required to merge these reads?

ADD REPLY • link 3.2 years ago by eb13 ▴ 20

0

Entering edit mode

I'd say the way to handle it depends on the biological question at hand. Indeed, merging strands focuses on the CpG sites and ignores hemimethylation. On the other hand, the great majority of studies on DNA methylation focus on CpG sites and do not study hemimethylation.

Is the study of hemimethylation relevant in your biological system or project? Are you prepared to provide an explanation for finding that a specific C on a CpG site can change while the other C on the other strand does not? Or that they change in an opposite manner?

If you are not specifically looking to study hemimethylation, I'd say to stick to the assumption that most of the epigenetic studies out there make (whether they explicitly say it or not): that CpG methylation is symmetric and thus hemimethylaton is not relevant.

For example, the literature is full of DNA methylation array studies. These arrays actually measure strand-specific methylation, but you will find that most studies talk about "CpG site methylation" and not "C methylation".

You are also right in that more Cs (if you do not collapse the strands and if you're doing CpG-site resolution differential testing) will influence your multiple testing burden. This won't be as relevant if you're doing testing for differential regions. Moreover, collapsing for CpG sites will increase your total read counts per CpG sites. So if you are filtering (e.g. filtering sites with < 10 counts), you will retain more.

So, to sum up, unless specifically interested in hemimethylation, IMO it is OK to merge the strands, especially if you are going to focus on CpG sites. This will also help the testing. As long as you clearly state what you do in your methods...

ADD REPLY • link 3.2 years ago by Papyrus ★ 2.9k