Methylation, Bisulfite sequencing: Reads, Alignment and Samtools tview
1
2
Entering edit mode
5.6 years ago
Shicheng Guo ★ 8.7k

How to understand the reads of mapped to the minus chain (Crick Chain) in Bisulfite-seq would be such reads ?

I do understand C->T change for the reads which mapped to positive/forward chain, However, I do not understand why the reads mapped to minus/reverse chains are same with the positive reference chain, except G->A change.

How does it works for bisulfite squencing?

"The reads in forward chain displayed as it original sequence while the reads in reverse chain were reverse complementary displayed. I do know why they are use different principle to display such two kinds of reads and how to distinguish these two kinds of reads before alignment?"

As to Figure 1c, do you know how to show raw reference and bisulfite treated reference together?

Figure 1a, 1b: These two figure are the same example with different mode at samtools tview:

Figure 1a.

Figure 1b.

Figure 1c

Eventually, I think it know what happened here. Thanks all the same for your helps.

As the strand identity of a bisulfite read is a priori unknown, our bisulfite mapping tool Bismark aims to find a unique alignment by running four alignment processes simultaneously. First, bisulfite reads are transformed into a C-to-T and G-to-A version (equivalent to a C-to-T conversion on the reverse strand). Then, each of them is aligned to equivalently pre-converted forms of the reference genome using four parallel instances of the short read aligner Bowtie (Fig. 1A). This read mapping enables Bismark to uniquely determine the strand origin of a bisulfite read. Consequently, Bismark can handle BS-Seq data from both directional and nondirectional libraries. Since residual cytosines in the sequencing read are converted in silico into a fully bisulfite-converted form before the alignment takes place, mapping performed in this manner handles partial methylation accurately and in an unbiased manner.

bisulfite-sequence • 3.9k views
1
Entering edit mode
5.6 years ago

Question: What does the reverse complement of a sequence with a C->T conversion look like?

Answer: The normal reverse complement but with a G->A conversion where the C->T takes place (after all, the complement of C is G and that of T is A).

Example:

Let's suppose that in you sequence the reference sequence below wherein the red-colored CpG is unmethylated. You happen to have two separate fragments covering this location, one that arose from the + strand and the other arising from the - strand. You would then see the following:

5' ACTAGCTAGCTAGCTGATC 3' Forward read
5' ACTAGCTAGCTAGCCGATC 3' Reference
3' TGATCGATCGATCGGTTAG 5' Reverse read

The reverse read is reverse complemented when it's displayed, so the C->T transition on the - strand will appear as G->A on the + strand. Note that each fragment only conveys information about a single strand (this is true for all current NGS technologies, though that fact is only important in bisulfite sequencing).

0
Entering edit mode

the forward read will be directly displayed while the reverse read will be reverse complemented when it's displayed? Then the question is that How do we know it is from forward or reverse chain before alignment? Is there any pre-information in the NGS sequencing in which have already have some idea about the fragment is from forwards or reverse chain?

Anyway, In current example. the reads in forward chain displayed as it original sequence while the reads in reverse chain were reverse complementary displayed. I do know why they are use different principle to display such two kinds of reads and how to distinguish these two kinds of reads before alignment?

1
Entering edit mode

"How do we know it is from forward or reverse chain before alignment?"

We don't. You end up needing to align each read/pair to 2-4 versions of the genome (depending on the library type). This is one of the annoyances of BSseq.

0
Entering edit mode

OK. Please check whether I followed you. The total process is : all the reads were aligned to 2 or 4 reference. and we will obtain the best alignment for each reads. if the reads were aligned to forwards or positive chain, the sequence were showed in fastq directly. otherwise, the reads which were mapped to negative or reverse chain would be reverse complemented and then showed in fastq and note that “negative chain with reverse complemented sequence”. Am I right?

0
Entering edit mode

That sounds right. I think Felix Krueger has a nice illustration of this in his RRBS Guide. I find this easier to understand with some visual representations like those around page 6.