Question: How Much Is Too Much For 5 Prime End Methylation Rate In Rrbs Data ?
4
gravatar for samsara
5.1 years ago by
samsara580
The Earth
samsara580 wrote:

I have RRBS fastq files. I used Bismark to perform methylation call. After methylation call I got M-bias plot shown below. The methylation rate of first three bases of 5 prime end is quite high. The actual methylation count and rate of first four position is shown below.

My questions are:

  1. Is the observed high methylation rate is because of end repair biases?
  2. In the literature It has been mentioned that it is common to have high methylation rate in 5' end, but how much is too much?
  3. First three bases of RRBS reads are either CGG or TGG depending on their methylation state. Is it good idea to chop off first 3 bases ? If yes, doesn't the removal of C (that retains original genomic methylation state) influence downstream analysis?
CpG context
===========
position    count methylated    count unmethylated    % methylation    coverage
1    5000734    2489532    66.76    7490266
2    430    206    67.61    636
3    190    131    59.19    321
4    34174    79253    30.13    113427

enter image description here

next-gen • 2.4k views
ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by samsara580
4
gravatar for Devon Ryan
5.1 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:
  1. Yes, the first 3 bases or so are likely due to end-repair. Alternatively, this could also be due to incorrect trimming if you didn't trim correctly (trim_galore is good for this and this case is mentioned in the bismark user guide).
  2. There's no objective answer to this. With Bison, the methylation bias tools will suggest ignoring regions according to a p-value derived from the likelihood of observing that extreme (or more) of a deviation from the methylation profile of the middle of the reads (with a minimum percentage difference, which I have default to 1%). That's similar to what the BSeqQC package does.
  3. Yeah, anytime you have a skewed graph like this you should ignore (or remove, depending on the tools) those regions. It's unfortunately the case that in RRBS this may remove a large portion of the methylation calls.
ADD COMMENTlink written 5.1 years ago by Devon Ryan89k

For RRBS, the majority of the reads start with CGG or TGG (at the 5'), and that's the MspI cutting sites left-over. For the M-bias plot, it plots methylation% in each base, there is a higher probability that the first base is methylated, other bases may even do not have a C, thus low methylation%. Does it make sense to trim the first three bases in this case?

Trim_galore with --rrbs option trimmed another 2bp from the 3' end to remove the filled (end-repair introduced) Cs (unmethylated)

I read from here http://www.bioinformatics.babraham.ac.uk/projects/bismark/RRBS_Guide.pdf

Thank you, Ming

ADD REPLYlink written 2.9 years ago by Ming Tang2.5k

I realize this is a very delayed followup, but I was hoping you might clarify #1. Shouldn't the end-repair impact the 2 bases at the end of the read, not the first 3 bases?

ADD REPLYlink written 2.3 years ago by igor7.6k

One would think so, yes, but for some reason the third base seems to be affected at least sometimes too. No clue why.

ADD REPLYlink written 2.3 years ago by Devon Ryan89k
1

But still, shouldn't it be the 2 or 3 bases at the end, not the beginning of the read?

For example, end-repair is causing problems at the end of the molecule and thus, the beginning of R2 for WGBS is wrong. Or is that a separate issue?

ADD REPLYlink written 2.3 years ago by igor7.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1322 users visited in the last hour