Newbie questions about Methyl-Seq (Bismark)
Entering edit mode
3.8 years ago
wester1086 ▴ 10


I'm quite new to epigenetics and DNA methylation data analysis, so any help from someone who had worked on this before would be appreciated. Many excuses in advance if my questions seem too naive or have already been answered elsewhere.

I was recently charged with analysis of Targeted bisulfite sequencing (BS-Seq) data of human patients.

Patients were sequenced on 3 different runs. They used Illumina's TruSeq MethylCapture EPIC Library prep kit (107 Mb, 3,340,894 CpG sites) and the sequencing was performed on a NextSeq 500. The data is paired-end (fastq R1 + fastq R2).

After the initial QC (fastQC) and adapter trimming (Trim Galore!), I aligned my fastqs on a reference genome (UCSC hg19). I used Bismark tool (0.19.0) for Alignment, Deduplication and Methylation calling. All patients were analysed with the same workflow.

What concerns me is a big difference of bismark reports between the runs, especially the deduplication rate (75% for run1!) and CHG/CHH methylation (nothing for run2):

  1. Run1
  2. Run2

I don't really know what to make of this, and how much it will affect the downstream analysis. I'm very new to Methyl-Seq but the ultimate goal is to perform a case-control study to identify Differentially Methylated CpG Sites (I was thinking of using methylkit).

Am I doing something wrong or is it the problem with the initial data?

Any insight will be appreciated.

alignment epigenetics methyl-seq bismark • 2.0k views
Entering edit mode
3.8 years ago

I suspect you didn't do the bisulfite treatment long enough in Run1. In general, your percentage CHH and CHG methylation should be near 0% and seeing otherwise typically means that something went wrong during library prep.

I would argue that deduplicating targeted capture data is the wrong way to go, you're going to artificially lower your coverage (unless you highly sequenced the samples).

Entering edit mode

Hi Devon,

Thanks for your insight. I will check with the people who did the bisulfite treatment about run 1.

Concerning the deduplication step, I had done some research on pubmed and people seem to perform the deduplication step in case of targeted capture Methyl-Seq :

Epigenetics 2015 - Agilent SureSelect Methyl-Seq kit, Illumina HiSeq2000, alignment and deduplication with Bismark. However, it is for mouse genome (mm9).

Epigenetics 2016 - SureSelectXT Methyl-Seq Target Enrichment System, HiSeq2000, alignement to human genome (hg19) and deduplication with Bismark.

Although the prep kits and sequencing platformq are different from those in my case, the general procedure seems to remain the same, which is why I decided to perform deduplication after alignment.

Please correct me if I'm wrong.

Entering edit mode

The question is mostly how exact the enriched regions are. If you end up with tight fragment size distributions and are using small regions for enrichment you're going to run into issues by deduplicating. If that's not the case then it makes sense to do so. What others have done is irrelevant, most people are terrible at analysing data.


Login before adding your answer.

Traffic: 864 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6