Question: Sequence Bias Bismark Output Interpretation
0
gravatar for fusion.slope
2.3 years ago by
fusion.slope200
fusion.slope200 wrote:

Hello Community,

I have performed with Bismark the M-Bias Plot and my reads looks as follow.

CpG Bias

as far as I am understanding the end of read 1 is showing a decrease in the Methylation at the 3'end for CHG and CHH total call (orange and green colors respectively). The strange thing is the read 2 in which I can observe a drop Immediately after the first 5 nucleotides for all the type of of Methylation combination (CHH total, CHG total, CHH methylation etc..). How should I have to interpret this? To me it sounds like the read 2 look quite bad in therms of nucleotide composition, and only the CpG methylation bias is showing a good pattern. Does anyone has experienced a similar problem?

In the bismark tutorial they clearly show that Read1 and Read2 have similar pattern (see here at page 16 https://www.bioinformatics.babraham.ac.uk/projects/bismark/Bismark_User_Guide.pdf )

Any comment is really appreciated. Cheers

ADD COMMENTlink modified 12 months ago • written 2.3 years ago by fusion.slope200

Hi, We meet the same problem. Did you solve yours? Thank you!

ADD REPLYlink written 2.1 years ago by Jingyue30
0
gravatar for fusion.slope
2.3 years ago by
fusion.slope200
fusion.slope200 wrote:

I found a good answer here:

https://sequencing.qcfail.com/articles/library-end-repair-reaction-introduces-methylation-biases-in-paired-end-pe-bisulfite-seq-applications/

ADD COMMENTlink written 2.3 years ago by fusion.slope200
0
gravatar for TEman
15 months ago by
TEman10
Sweden
TEman10 wrote:

Hi,

What about the gradual decrease in CHH total and CHG total in R2? I see the same gradual decrease over R2 in my dataset. Does anyone have an explanation for this?

ADD COMMENTlink written 15 months ago by TEman10

here the explanation about R2:

https://sequencing.qcfail.com/articles/library-end-repair-reaction-introduces-methylation-biases-in-paired-end-pe-bisulfite-seq-applications/

you can try to have a better performance using this command in Bismark:

bismark_methylation_extractor --ignore_r2 2 --gzip sample1_bismark_bt2_pe.bam

ADD REPLYlink written 15 months ago by fusion.slope200
0
gravatar for TEman
15 months ago by
TEman10
Sweden
TEman10 wrote:

Thank you for your answer.

I get the bias at the very start (drop of methylated C). I have already trimmed the leading bases of R2 reads.

However, my concern is the gradual decrease in the CHH Total Calls and CHG Total Calls. Here is the QC from Bismark User Guide (https://rawgit.com/FelixKrueger/Bismark/master/Docs/Bismark_User_Guide.html):

R2_CHH total bias

I cannot find any discussion or explanation about this bias? Is it all due to trimming of low quality bases at the 3'end of R2? Looking at the read length distribution of my trimmed R2 reads, they do not follow this pattern, as the very most of them are of the max length..

ADD COMMENTlink modified 15 months ago • written 15 months ago by TEman10
2

Hi TEman,

I came across a similar decrease in the total number of total Cs calls and I share your concern. I tried processing R2 as single end and the decrease disappeared..

I asked about it and turns out that this kind of pattern is present because the overlap between R1 and R2 is removed/ignored during methylation extraction to avoid scoring methylation twice in the same fragment (--no_overlap option). Only R1 is used at the overlap region. Try running the analysis with --include_overlap, the decrease should disappear. That being said, I think --no_overlap should always be used with PE reads as recommended.

ADD REPLYlink written 12 months ago by Kramdi140

if you read the link I sent you is exaplained (even though is not a clear explanation):

"The methylation state of the first couple of bases or Read 2 drops from the average ~70% down to ~3%, and this steep drop can certainly be explained by the filled-in unmethylated cytosines. Since there is no reason to believe that there would be a biological role for lower methylation at the start of reads we have to assume that this low-methylation is artefactual, and thus leaving this untreated will introduce, in this case several hundred thousand, incorrect methylation calls (and thus noise) into the dataset."

ADD REPLYlink written 15 months ago by fusion.slope200

I am not sure if we understand each other. I have read that link.

Ignore the first 10 bases. Ignore the methylation. Just look at the CHH Total Calls line. It goes from ~4,500k at the beginning to ~1000-500k at the end. What happens in between? Why are there gradually fewer total CHH calls over the read?

ADD REPLYlink written 15 months ago by TEman10

Yes I got your point, and sorry if i did not get your full concern at the first time. What i can tell you from my experience is that in the read 2 for CHG or CHH there is always that kind of drop, why i do not know. Since I was mostly interested in the CpG (where the trend of the methylation values is stable along the R2), i did not try to go in deep into that problem.

ADD REPLYlink modified 15 months ago • written 15 months ago by fusion.slope200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2180 users visited in the last hour