Why would there be a large bias in number in the negative and positive strand after aligning with Bismark?
1
3
Entering edit mode
3.7 years ago
daisy.ko ▴ 30

I have a paired-end data from "sureselect methyl-seq target enrichment system" which is a kind of Bisulfite sequencing data produced in a similar manner as RRBS as far as I understand.

However, after aligning with Bismark with the default Bowtie2, I found that there is a really large bias for the read mapped to the positive strand and to the negative strand, just as shown below:

Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT:       1073089 ((converted) top strand)
GA/CT/CT:       6013    (complementary to (converted) top strand)
GA/CT/GA:       94492   (complementary to (converted) bottom strand)
CT/GA/GA:       24124100        ((converted) bottom strand)


I know maybe I should use the default directional option after finding the non-directional option will lead to something like this but I am showing this to illustrate that there is really a big discrepancy between the alignment to the top strand and the alignment to the bottom strand.

And after doing methylation extraction and having the genome-wide cytosine report, I get something like the following:

chr12   120250754       +       1       1       CG      CGC
chr12   120250755       -       103     9       CG      CGC
chr12   120250870       +       0       0       CG      CGG
chr12   120250871       -       59      2       CG      CGT
chr12   120250913       +       0       0       CG      CGA
chr12   120250914       -       27      0       CG      CGA


As far as my understanding, there should not be that large bias in the number of reads in positive strand and in negative strand. I have trimmed my data before running alignment and I used almost default setting with Bismark. I would like to know having this kind of result is because of the data quality or because of my way of handling this kind of data. I would also like to know that whether there is a way to handle this problem if it happened because of data quality.

Thank you!

alignment sequencing bismark • 1.3k views
0
Entering edit mode

I'm working with the same kit and get the same results using trim_galore with --paired and bismark with bowtie2 and -N 1 (directional is the default). No idea why...

0
Entering edit mode

Seems that we are having the same problem .... Not sure whether it is because of data quality ... Do you also have a larger number of aligned read in the bottom strand?

2
Entering edit mode
3.7 years ago

Hi both,

To my knowledge this is exactly what one would expect from a target enrichment with the Agilent SureSelect kit. The reason for this is that most of the SureSelect probes are targetting the forward strand sequence for the initial pull-down reaction, so the sequences that end up being present in your library are preferentially sequences from the bottom strand; these sequences consequently map to the OB (original bottom strand). Some probes also target bottom strand sequences, and these will map to the OT strands.

I would simply run Bismark in default (= directional ) mode instead of --non_directional, so you don't conflate the results with spurious alignments to the CTOT and CTOB strands.

0
Entering edit mode

Hi Felix,

I think that I should post it here ...

Thanks for answering. So in that case, I should use the default directional setting with Bismark. But however, I found that even using this setting would get an extreme difference in the read of OB and OT. How can coverage be calculated when we are getting this kind of huge difference? Thank you for answering =)