I'm quite new to epigenetics and DNA methylation data analysis, so any help from someone who had worked on this before would be appreciated. Many excuses in advance if my questions seem too naive or have already been answered elsewhere.
I was recently charged with analysis of Targeted bisulfite sequencing (BS-Seq) data of human patients.
Patients were sequenced on 3 different runs. They used Illumina's TruSeq MethylCapture EPIC Library prep kit (107 Mb, 3,340,894 CpG sites) and the sequencing was performed on a NextSeq 500. The data is paired-end (fastq R1 + fastq R2).
After the initial QC (fastQC) and adapter trimming (Trim Galore!), I aligned my fastqs on a reference genome (UCSC hg19). I used Bismark tool (0.19.0) for Alignment, Deduplication and Methylation calling. All patients were analysed with the same workflow.
What concerns me is a big difference of bismark reports between the runs, especially the deduplication rate (75% for run1!) and CHG/CHH methylation (nothing for run2):
I don't really know what to make of this, and how much it will affect the downstream analysis. I'm very new to Methyl-Seq but the ultimate goal is to perform a case-control study to identify Differentially Methylated CpG Sites (I was thinking of using methylkit).
Am I doing something wrong or is it the problem with the initial data?
Any insight will be appreciated.